Re: [HACKERS] assorted code cleanup
On Wed, Sep 6, 2017 at 4:12 AM, Peter Eisentrautwrote: > On 8/21/17 01:11, Michael Paquier wrote: >>> - Drop excessive dereferencing of function pointers >> >> - (*next_ProcessUtility_hook) (pstmt, queryString, >> + next_ProcessUtility_hook(pstmt, queryString, >> context, params, queryEnv, >> dest, completionTag); >> But this... Personally I like the current grammar which allows one to >> make the difference between a function call with something declared >> locally and something that may be going to a custom code path. So I >> think that you had better not update the system hooks that external >> modules can use via shared_preload_libraries. > > Do you mean specifically the hook variables, or any function pointers? > I can see your point in the above case, but for example here > > - if ((*tinfo->f_lt) (o.upper, c.upper, flinfo)) > + if (tinfo->f_lt(o.upper, c.upper, flinfo)) > > I think there is no loss of clarity and the extra punctuation makes it > more complicated to read. I am referring only to hook variables here. For functions only used internally by the backend, I agree that using a direct point to those functions makes things better, because it is more easily possible to make a difference with the hook paths. Keeping a different grammar for local code and hook code allows readers to make a clearer difference that both things have different concepts. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Hello Alik, Applies, compiles, works for me. Some minor comments and suggestions. Two typos: - "usinng" -> "using" - "a rejection method used" -> "a rejection method is used" I'm not sure of "least_recently_used_i", this naming style is not used in pgbench. "least_recently_used" would be ok. "..nb_cells.. != ZIPF_CACHE_SIZE", ISTM that "<" is more logical, even if the result is the same? I would put the parameter value check in getZipfianRand, so that if someone reuse the function elsewhere the check is also performed. That would also simplify a bit the already very large expression evaluation function. When/if the pgbench tap test patch get through, coverage tests should be added. Maybe the cache overflow could be counted, to allow for a possible warning message in the final report? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cache lookup errors for missing replication origins
On Tue, Sep 5, 2017 at 12:59 PM, Michael Paquierwrote: > ERROR: 42704: replication slot "%s" does not exist s/slot/origin/ > As far as I can see, replorigin_by_oid makes no use of its missing_ok > = false in the backend code, so letting it untouched would have no > impact. replorigin_by_name with missing_ok = false is only used with > SQL-callable functions, so it could be changed without any impact > elsewhere (without considering externally-maintained replication > modules). As long as I don't forget, attached is a patch added as well to the next CF. replorigin_by_oid should have the same switch from elog to ereport in my opinion. Additional regression tests are included. -- Michael replorigin-elogs.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction
On Mon, Sep 4, 2017 at 11:43 PM, Arseny Sherwrote: > Arseny Sher writes: > >> Attached patch fixes this by stopping workers before RO drop, as >> already done in case when we drop replication slot. > > Sorry, here is the patch. > I could reproduce this issue, it's a bug. Added this to the open item. The cause of this is commit 7e174fa7 which make subscription ddls kill the apply worker only when transaction commit. I didn't look the patch deeply yet but I'm not sure the approach that kills apply worker before commit would be good. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Tuple-routing for certain partitioned tables not working as expected
On Tue, Sep 05, 2017 at 08:35:13PM +0900, Etsuro Fujita wrote: > On 2017/08/30 17:20, Etsuro Fujita wrote: > >On 2017/08/30 9:13, Amit Langote wrote: > >>On 2017/08/29 20:18, Etsuro Fujita wrote: > >>>On 2017/08/25 22:26, Robert Haas wrote: > On Wed, Aug 23, 2017 at 4:55 AM, Etsuro Fujita >wrote: > >Agreed, but I'd vote for fixing this in v10 as proposed; I agree > >that just > >ripping the CheckValidResultRel checks out entirely is not a good > >idea, > >but > >that seems OK to me at least as a fix just for v10. > > I'm still not on-board with having this be the one case where we don't > do CheckValidResultRel. If we want to still call it but pass down > some additional information that can selectively skip certain checks, > I could probably live with that. > >>> > >>>Another idea would be to not do CheckValidResultRel for partitions in > >>>ExecSetupPartitionTupleRouting; instead, do that the first time the > >>>partition is chosen by ExecFindPartition, and if successfully checked, > >>>initialize the partition's ResultRelInfo and other stuff. (We could > >>>skip > >>>this after the first time by checking whether we already have a valid > >>>ResultRelInfo for the chosen partition.) That could allow us to not > >>>only > >>>call CheckValidResultRel the way it is, but avoid initializing useless > >>>partitions for which tuples aren't routed at all. > >> > >>I too have thought about the idea of lazy initialization of the partition > >>ResultRelInfos. I think it would be a good idea, but definitely > >>something > >>for PG 11. > > > >Agreed. Here is a patch to skip the CheckValidResultRel checks for a > >target table if it's a foreign partition to perform tuple-routing for, as > >proposed by Robert. > > I'll add this to the open items list for v10. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Robert, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CLUSTER command progress monitor
Hi Hackers, I revised the patch like this: - Add "command" column in the view It tells that the running command is CLUSTER or VACUUM FULL. - Enable VACUUM FULL progress monitor Add heap_tuples_vacuumed and heap_tuples_recently_dead as a counter in the view. Sequence of phases are below: 1. scanning heap 5. swapping relation files 6. rebuild index 7. performing final cleanup I didn't change the name of view (pg_stat_progress_cluster) because I'm not sure whether the new name (pg_stat_progress_reorg) is suitable or not. Any comments or suggestion are welcome. Thanks, Tatsuro Yamada On 2017/09/04 20:17, Tatsuro Yamada wrote: On 2017/09/04 15:38, Michael Paquier wrote: On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamadawrote: Then I have questions. * Should we have separate views for them? Or should both be covered by the same view with some indication of which command (CLUSTER or VACUUM FULL) is actually running? Using the same view for both, and tell that this is rather VACUUM or CLUSTER in the view, would be better IMO. Coming up with a name more generic than pg_stat_progress_cluster may be better though if this speaks with VACUUM FULL as well, user-facing documentation does not say that VACUUM FULL is actually CLUSTER. Thanks for sharing your thoughts. Agreed. I'll add new column like a "command" to tell whether running CLUSTER or VACUUM. And how about this new view name? - pg_stat_progress_reorg Is it more general name than previous name if it covers both commands? I'll add this patch to CF2017-09. Any comments or suggestion are welcome. Nice to see that you are taking the time to implement patches for upstream, Yamada-san! Same here. :) I'd like to contribute creating feature that is for DBA and users. Progress monitoring feature is important from my DBA experiences. I'm happy if you lend your hand. Thanks, Tatsuro Yamada diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 38bf636..33cedc0 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -332,6 +332,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER and VACUUM FULL, showing current progress. + See . + + + @@ -3233,9 +3241,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in the future. + certain commands during command execution. Currently, the suppoted + progress reporting commands are VACUUM and CLUSTER. + This may be expanded in the future. @@ -3247,9 +3255,8 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, one row for each backend (including autovacuum worker processes) that is currently vacuuming. The tables below describe the information that will be reported and provide information about how to interpret it. - Progress reporting is not currently supported for VACUUM FULL - and backends running VACUUM FULL will not be listed in this - view. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command. See . @@ -3427,6 +3434,228 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + + + CLUSTER Progress Reporting + + + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + one row for each backend that is currently clustering or vacuuming (VACUUM FULL). + The tables below describe the information that will be reported and + provide information about how to interpret it. + + + + pg_stat_progress_cluster View + + + + Column + Type + Description + + + + + + pid + integer + Process ID of backend. + + + datid + oid + OID of the database to which this backend is connected. + + + datname + name + Name of the database to which this backend is connected. + + + relid + oid + OID of the table being clustered. + + + command + text + + Current processing command: CLUSTER/VACUUM FULL. + + + + phase + text + + Current processing phase of cluster/vacuum full. See or . + + + + scan_method + text + + Scan method of table: index scan/seq scan. + + + + scan_index_relid + bigint + + OID of the index. + + + + heap_tuples_total + bigint + + Total number of heap tuples in the table. This number is reported + as of the beginning of the scan; tuples added later will not
[HACKERS] Changing Jobs
Hi Everyone, Since the beginning of September I'm working full-time for EDB. The plan for now is to continue working at the projects I was working on, in particular JIT, and to have more time for reviews. I'll be spending the large majority of my time on community PG. I'm primarily mentioning it to avoid the appearance of a concealed bias when reviewing/committing patches by other contributors from EDB. Besides that, I'll continue to have an advisor position at Citus Data. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization
Applies, compiles, works for me. Very very minor comments that I should have noticed before, sorry for this additional round trip. In the help line, move -I just after -i, to put short options in alphabetical and decreasing importance order. On this line, also add the information about the default, something like: -i, --ini... -I, --custom=[...]+ (default "ctgvp") ... When/if the pgbench tap test patch get through, the feature should be tested there as well. No action needed now. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] paths in partitions of a dummy partitioned table
On Fri, Aug 25, 2017 at 10:46 PM, Robert Haaswrote: > On Thu, Jul 6, 2017 at 11:35 AM, Ashutosh Bapat > wrote: >> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the >> partition relations dummy and thus doesn't set any (dummy) paths in the >> partition relations. The lack of paths in the partitions means that we can >> not use partition-wise join while joining this table with some other >> similarly >> partitioned table as the partitions do not have any paths that child-joins >> can >> use. This means that we can not create partition relations for such a join >> and >> thus can not consider the join to be partitioned. This doesn't matter much >> when >> the dummy relation is the outer relation, since the resultant join is also >> dummy. But when the dummy relation is inner relation, the resultant join is >> not >> dummy and can be considered to be partitioned with same partitioning scheme >> as >> the outer relation to be joined with other similarly partitioned table. Not >> having paths in the partitions deprives us of this future optimization. > > I think it's wrong for any code to be examining the path list for a > rel marked dummy, so I would suggest approaching this from a different > direction. Me and Robert had an offline discussion about this. I am summarizing it here for the sake of completeness. A dummy relation is identified by the only dummy path that exists in its pathlist. There is no flag in RelOptInfo which tells whether a given relation is dummy or not, it's the dummy path which tells that. A dummy path is an Append path with no subpaths. Planner doesn't treat dummy relations any different from other relations when it comes to using paths. When a dummy relation participates in a join, the dummy path is used as one of the joining paths and converted to a Result plan at the time of planning. So, for a partition-wise join where one of the joining relations is dummy, its every child must have dummy path which can be used to construct child-join paths. But we don't need to mark partition relations dummy (if their parent is dummy) even when it's not going to participate in partition-wise join. The partition relations will be marked dummy when we know that they will be required for partition-wise join. I was worried that we might mark base relation dummy during join planning this way, but we already have a precedence for that in add_paths_to_join_rel(). So, shouldn't be a problem. So, I have now added a patch in partition-wise join set to mark partition relations dummy when their parent is dummy. > Given A LEFT JOIN B where Bk is dummy, I suggest > constructing the path for (AB)k by taking a path from Ak and applying > an appropriate PathTarget. You don't really need a join path at all; > a path for the non-dummy input is fine - and, in fact, better, since > it will be cheaper to execute. One problem is that it may not produce > the correct output columns. (AB) may drop some columns that were > being generated by A because they were only needed to perform the > join, and it may add additional columns taken from B. But I think > that if you take the default PathTarget for (AB) and replace > references to columns of B with NULL constants, you should be able to > apply the resulting PathTarget to a path for Ak and get a valid path > for (AB)k. Maybe there is some reason why that won't work exactly, > but I think it is probably more promising to attack the problem from > this angle than to do what you propose. Sticking dummy joins into the > query plan that are really just projecting out NULLs is not appealing. > This might help in the cases when the RelOptInfo itself is missing e.g. missing partitions in partition matching as discussed in [1]. I will discuss this approach on that thread. [1] https://www.postgresql.org/message-id/cafjfprdjqvauev5djx3tw6pu5eq54nckadtxhx2jijg_gvb...@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql - add special variable to reflect the last query status
Hello Tom, Here is a version 6. A few thoughts about this patch: * I think the ERROR_CODE variable should instead be named SQLSTATE. That is what the SQL standard calls this string, and it's also what just about all our documentation calls it; see PG_DIAG_SQLSTATE in libpq, or the SQLSTATE 'x' construct in pl/pgsql, or the sqlstate attribute of an exception object in plpython, etc etc. ERROR_CODE -> SQLSTATE. * I'm not exactly convinced that there's a use-case for STATUS Removed, but I think it was nice to have, it is easier to interpret than error codes and their classes that I have not memorized yet:-) * It might be better if SQLSTATE and ERROR_MESSAGE were left unchanged by a non-error query. That would reduce the need to copy them into other variables just because you needed to do something else before printing them. It'd save a few cycles too. Added LAST_ERROR_SQLSTATE & MESSAGE, only reset when an error occured. * Speaking of which, has anyone tried to quantify the performance impact of this patch? It might well be negligible, but I do not think we should assume that without evidence. My guess is negligible. Not sure how to measure this negligible, as many very fast query should be executed to have something significant. Maybe 100,000 "SELECT 1;" in a script? * I wonder why you didn't merge this processing into ProcessResult, instead of inventing an extra function (whose call site seems rather poorly chosen anyhow --- what's the justification for not counting this overhead as part of the query runtime?). You could probably eliminate the refactoring you did, since it wouldn't be necessary to recompute AcceptResult's result that way. Variable setting moved at then end of ProcessResult, no new functions, result is clean, so I should have done it like that in the beginning. Forgotten help stuff added. -- Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 5bdbc1e..0bbe1c9 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3475,6 +3475,16 @@ bar + ERROR + + + Whether the last query failed, as a boolean. + See also SQLSTATE. + + + + + FETCH_COUNT @@ -3611,6 +3621,18 @@ bar + LAST_ERROR_SQLSTATE + LAST_ERROR_MESSAGE + + + The error code and associated error message of the last + error, or empty strings if no error occured since the + beginning of the script. + + + + + ON_ERROR_ROLLBACK @@ -3679,6 +3701,25 @@ bar + ROW_COUNT + + + How many rows were returned or affected by the last query. + + + + + + SQLSTATE + + + The error code associated to the last query, or + 0 if no error occured. + + + + + QUIET diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index b997058..93950fd 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -493,7 +493,6 @@ ResetCancelConn(void) #endif } - /* * AcceptResult * @@ -1107,6 +1106,35 @@ ProcessResult(PGresult **results) first_cycle = false; } + /* + * Set special variables + * - ERROR: TRUE/FALSE, whether an error occurred + * - SQLSTATE: code of error, or "0" + * - LAST_ERROR_SQLSTATE: same for last error + * - LAST_ERROR_MESSAGE: message of last error + * - ROW_COUNT: how many rows were returned or affected, or "0" + */ + if (success) + { + char *ntuples = PQcmdTuples(*results); + SetVariable(pset.vars, "ERROR", "FALSE"); + SetVariable(pset.vars, "SQLSTATE", "0"); + SetVariable(pset.vars, "ROW_COUNT", *ntuples ? ntuples : "0"); + } + else + { + char *code = PQresultErrorField(*results, PG_DIAG_SQLSTATE); + char *mesg = PQresultErrorField(*results, PG_DIAG_MESSAGE_PRIMARY); + + SetVariable(pset.vars, "ERROR", "TRUE"); + /* if an error was detected, it must have a code! */ + Assert(code != NULL); + SetVariable(pset.vars, "SQLSTATE", code); + SetVariable(pset.vars, "LAST_ERROR_SQLSTATE", code); + SetVariable(pset.vars, "LAST_ERROR_MESSAGE", mesg ? mesg : ""); + SetVariable(pset.vars, "ROW_COUNT", "0"); + } + /* may need this to recover from conn loss during COPY */ if (!first_cycle && !CheckConnection()) return false; @@ -1214,7 +1242,6 @@ PrintQueryResults(PGresult *results) return success; } - /* * SendQuery: send the query string to the backend * (and print out results) diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 4d1c0ec..ae951f5 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -337,7 +337,7 @@ helpVariables(unsigned short int pager) * Windows builds currently print one more line than non-Windows builds. * Using the larger number is fine.
Re: [HACKERS] Replacing lfirst() with lfirst_node() appropriately in planner.c
Ashutosh Bapat wrote: > On Wed, Sep 6, 2017 at 1:32 AM, Tom Lanewrote: > > BTW, I think we *could* use "lfirst_node(List, ...)" in cases where > > we know the list is supposed to be a list of objects rather than ints > > or Oids. I didn't do anything about that observation, though. > > IMO, it won't be apparent as to why some code uses lfirst_node(List, > ...) and some code uses (List *) lfirst(). Yeah -- based on that argument, I too think we should leave those alone. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dropping partitioned tables without CASCADE
Okay, I have marked this as ready for committer. Thanks, On Wed, Sep 6, 2017 at 2:50 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > On Wed, Sep 6, 2017 at 1:06 PM, Rushabh Lathia> wrote: > > > > 2) Add partition to the foo; > > > > create table foo_p1 partition of foo for values in (1, 2, 3) partition by > > list (b); > > > > postgres=# \d foo > > Table "public.foo" > > Column | Type | Collation | Nullable | Default > > +-+---+--+- > > a | integer | | | > > b | integer | | | > > Partition key: LIST (a) > > Number of partitions: 1 (Use \d+ to list them.) > > > > postgres=# \d+ foo > > Table "public.foo" > > Column | Type | Collation | Nullable | Default | Storage | Stats > target > > | Description > > +-+---+--+-+ > -+--+- > > a | integer | | | | plain | > > | > > b | integer | | | | plain | > > | > > Partition key: LIST (a) > > Partitions: foo_p1 FOR VALUES IN (1, 2, 3) has partitions > > > > Above verbose output for foo says, foo_p1 "has partitions". But if I do > > > > postgres=# \d foo_p1 > >Table "public.foo_p1" > > Column | Type | Collation | Nullable | Default > > +-+---+--+- > > a | integer | | | > > b | integer | | | > > Partition of: foo FOR VALUES IN (1, 2, 3) > > Partition key: LIST (b) > > Number of partitions: 0 > > > > it tell "Number of partitions: 0". > > > > I feel like information is conflicting with each other. AFAIU, idea about > > adding > > "has partitions" was to let know that it's a partitioned table. So can > you > > directly > > add the "is partitioned" in place of "has partitions"? > > > > Did those change in the attached patch and update regression expected > > output. > > > > Looks better. > > > Also run pgindent on the patch. > > > > Thanks for the changes. The patch looks good to me. > > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > -- Rushabh Lathia
Re: [HACKERS] document and use SPI_result_code_string()
On Thu, Aug 31, 2017 at 11:23 AM, Peter Eisentrautwrote: > A lot of semi-internal code just prints out numeric SPI error codes, > which is not very helpful. We already have an API function > SPI_result_code_string() to convert the codes to a string, so here is a > patch to make more use of that and also document it for external use. > > Also included are two patches to clarify that some RI error handling > code that I encountered at the same time. Those are simple things. Agreed for 0001 to untangle things. Fine for 0002. This reminds me of LockGXact and RemoveGXact in twophase.c, as well as _hash_squeezebucket that have some code paths that cannot return... Any thoughts about having some kind of PG_NOTREACHED defined to 0 which could be put in an assertion? +1 for 0003. There are other undocumented functions available to users: - SPI_plan_is_valid - SPI_execute_snapshot - SPI_plan_get_plan_sources - SPI_plan_get_cached_plan - SPI_datumTransfer -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means
On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondrawrote: > On 7/25/17 12:55 AM, Tom Lane wrote: > >> Tomas Vondra writes: >> >>> It seems to me that VACUUM and ANALYZE somewhat disagree on what >>> exactly reltuples means. VACUUM seems to be thinking that reltuples >>> = live + dead while ANALYZE apparently believes that reltuples = >>> live >>> >> >> The question is - which of the reltuples definitions is the right >>> one? I've always assumed that "reltuples = live + dead" but perhaps >>> not? >>> >> >> I think the planner basically assumes that reltuples is the live >> tuple count, so maybe we'd better change VACUUM to get in step. >> >> > Attached is a patch that (I think) does just that. The disagreement was > caused by VACUUM treating recently dead tuples as live, while ANALYZE > treats both of those as dead. > > At first I was worried that this will negatively affect plans in the > long-running transaction, as it will get underestimates (due to reltuples > not including rows it can see). But that's a problem we already have > anyway, you just need to run ANALYZE in the other session. Thanks for the patch. >From the mail, I understand that this patch tries to improve the reltuples value update in the catalog table by the vacuum command to consider the proper visible tuples similar like analyze command. - num_tuples); + num_tuples - nkeep); With the above correction, there is a problem in reporting the number of live tuples to the stats. postgres=# select reltuples, n_live_tup, n_dead_tup from pg_stat_user_tables join pg_class using (relname) where relname = 't'; reltuples | n_live_tup | n_dead_tup ---++ 899818 | 799636 | 100182 (1 row) The live tuples data value is again decremented with dead tuples value before sending them to stats in function lazy_vacuum_rel(), /* report results to the stats collector, too */ new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples; The fix needs a correction here also. Or change the correction in lazy_vacuum_rel() function itself before updating catalog table similar like stats. While testing this patch, I found another problem that is not related to this patch. When the vacuum command is executed mutiple times on a table with no dead rows, the number of reltuples value is slowly reducing. postgres=# select reltuples, n_live_tup, n_dead_tup from pg_stat_user_tables join pg_class using (relname) where relname = 't'; reltuples | n_live_tup | n_dead_tup ---++ 899674 | 899674 | 0 (1 row) postgres=# vacuum t; VACUUM postgres=# select reltuples, n_live_tup, n_dead_tup from pg_stat_user_tables join pg_class using (relname) where relname = 't'; reltuples | n_live_tup | n_dead_tup ---++ 899622 | 899622 | 0 (1 row) postgres=# vacuum t; VACUUM postgres=# select reltuples, n_live_tup, n_dead_tup from pg_stat_user_tables join pg_class using (relname) where relname = 't'; reltuples | n_live_tup | n_dead_tup ---++ 899570 | 899570 | 0 (1 row) In lazy_scan_heap() function, we force to scan the last page of the relation to avoid the access exclusive lock in lazy_truncate_heap if there are tuples in the last page. Because of this reason, the scanned_pages value will never be 0, so the vac_estimate_reltuples function will estimate the tuples based on the number of tuples from the last page of the relation. This estimation is leading to reduce the number of retuples. I am thinking whether this problem really happen in real world scenarios to produce a fix? Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] psql - add special variable to reflect the last query status
Here is a version 6. Small v7 update, sorry for the noise. Add testing the initial state of all variables. Fix typos in a comment in tests. Fix the documentation wrt the current implementation behavior. -- Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 5bdbc1e..97962d4 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3475,6 +3475,16 @@ bar + ERROR + + + Whether the last query failed, as a boolean. + See also SQLSTATE. + + + + + FETCH_COUNT @@ -3611,6 +3621,18 @@ bar + LAST_ERROR_SQLSTATE + LAST_ERROR_MESSAGE + + + The error code and associated error message of the last + error, or "0" and empty strings if no error occured + since the beginning of the script. + + + + + ON_ERROR_ROLLBACK @@ -3679,6 +3701,25 @@ bar + ROW_COUNT + + + How many rows were returned or affected by the last query. + + + + + + SQLSTATE + + + The error code associated to the last query, or + 0 if no error occured. + + + + + QUIET diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index b997058..93950fd 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -493,7 +493,6 @@ ResetCancelConn(void) #endif } - /* * AcceptResult * @@ -1107,6 +1106,35 @@ ProcessResult(PGresult **results) first_cycle = false; } + /* + * Set special variables + * - ERROR: TRUE/FALSE, whether an error occurred + * - SQLSTATE: code of error, or "0" + * - LAST_ERROR_SQLSTATE: same for last error + * - LAST_ERROR_MESSAGE: message of last error + * - ROW_COUNT: how many rows were returned or affected, or "0" + */ + if (success) + { + char *ntuples = PQcmdTuples(*results); + SetVariable(pset.vars, "ERROR", "FALSE"); + SetVariable(pset.vars, "SQLSTATE", "0"); + SetVariable(pset.vars, "ROW_COUNT", *ntuples ? ntuples : "0"); + } + else + { + char *code = PQresultErrorField(*results, PG_DIAG_SQLSTATE); + char *mesg = PQresultErrorField(*results, PG_DIAG_MESSAGE_PRIMARY); + + SetVariable(pset.vars, "ERROR", "TRUE"); + /* if an error was detected, it must have a code! */ + Assert(code != NULL); + SetVariable(pset.vars, "SQLSTATE", code); + SetVariable(pset.vars, "LAST_ERROR_SQLSTATE", code); + SetVariable(pset.vars, "LAST_ERROR_MESSAGE", mesg ? mesg : ""); + SetVariable(pset.vars, "ROW_COUNT", "0"); + } + /* may need this to recover from conn loss during COPY */ if (!first_cycle && !CheckConnection()) return false; @@ -1214,7 +1242,6 @@ PrintQueryResults(PGresult *results) return success; } - /* * SendQuery: send the query string to the backend * (and print out results) diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c index 4d1c0ec..ae951f5 100644 --- a/src/bin/psql/help.c +++ b/src/bin/psql/help.c @@ -337,7 +337,7 @@ helpVariables(unsigned short int pager) * Windows builds currently print one more line than non-Windows builds. * Using the larger number is fine. */ - output = PageOutput(147, pager ? &(pset.popt.topt) : NULL); + output = PageOutput(155, pager ? &(pset.popt.topt) : NULL); fprintf(output, _("List of specially treated variables\n\n")); @@ -360,6 +360,8 @@ helpVariables(unsigned short int pager) "if set to \"noexec\", just show them without execution\n")); fprintf(output, _(" ENCODING\n" "current client character set encoding\n")); + fprintf(output, _(" ERROR\n" + "whether the last query failed\n")); fprintf(output, _(" FETCH_COUNT\n" "the number of result rows to fetch and display at a time (0 = unlimited)\n")); fprintf(output, _(" HISTCONTROL\n" @@ -374,6 +376,9 @@ helpVariables(unsigned short int pager) "number of EOFs needed to terminate an interactive session\n")); fprintf(output, _(" LASTOID\n" "value of the last affected OID\n")); + fprintf(output, _(" LAST_ERROR_SQLSTATE\n" + " LAST_ERROR_MESSAGE\n" + "error code and message of last error, or \"0\" and empty if none\n")); fprintf(output, _(" ON_ERROR_ROLLBACK\n" "if set, an error doesn't stop a transaction (uses implicit savepoints)\n")); fprintf(output, _(" ON_ERROR_STOP\n" @@ -388,6 +393,8 @@ helpVariables(unsigned short int pager) "specifies the prompt used during COPY ... FROM STDIN\n")); fprintf(output, _(" QUIET\n" "run quietly (same as -q option)\n")); + fprintf(output, _(" ROW_COUNT\n" + "number of rows of last query, or 0\n")); fprintf(output, _(" SERVER_VERSION_NAME\n" " SERVER_VERSION_NUM\n" "server's version (in
Re: [HACKERS] CLUSTER command progress monitor
On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamadawrote: > I revised the patch like this: You should avoid top-posting. > I didn't change the name of view (pg_stat_progress_cluster) because I'm not > sure > whether the new name (pg_stat_progress_reorg) is suitable or not. Here are some ideas: rewrite (incorrect for ALTER TABLE), organize (somewhat fine), order, operate (too general?), bundle, reform, assemble. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 【ECPG】strncpy function does not set the end character '\0'
> When we reviewed the ecpg code,we found the array seem not have the > end > character('\0') after using the strncpy function. True. > In the function ECPGnoticeReceiver, we use the stncpy function copy > the > sqlstate to sqlca->sqlstate. And the sqlca->sqlstate is defined as > the size > of 5, and the copy size is sizeof(sqlca->sqlstate). However, from the > previous strcmp function, the sqlstate size may be 5,such as > ECPG_SQLSTATE_INVALID_CURSOR_NAME. So there may be lack of the end > character > for sqlca->sqlstate. Why do you think there should be one? My memory might be wrong but I don't think it's supposed to be a null terminated string. Michael -- Michael Meskes Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org) Meskes at (Debian|Postgresql) dot Org Jabber: michael at xmpp dot meskes dot org VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Hi, At Mon, 4 Sep 2017 17:17:19 +0900, Michael Paquierwrote in
Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands
Tom Lane wrote: > "Bossart, Nathan"writes: > > On 9/4/17, 10:32 PM, "Simon Riggs" wrote: > >> If we want to keep the code simple we must surely consider whether the > >> patch has any utility. > > > ... I'd argue that this feels like a natural extension of the > > VACUUM command, one that I, like others much earlier in this thread, > > was surprised to learn wasn't supported. > > Yeah. To me, one big argument for allowing multiple target tables is that > we allow it for other common utility commands such as TRUNCATE or LOCK > TABLE. TRUNCATE has actual an feature behind its multi-table ability: you can truncate tables linked by FKs that way, and not otherwise. VACUUM, like LOCK TABLE, have no such benefit. (If one is programatically locking multiple tables, it is easier to do one table per command than many in one command, anyway.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dropping partitioned tables without CASCADE
I picked this patch for review and started looking at the implementation details. Consider the below test: 1) postgres=# create table foo (a int, b int) partition by list (a); CREATE TABLE postgres=# \d foo Table "public.foo" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | b | integer | | | Partition key: LIST (a) Number of partitions: 0 postgres=# \d+ foo Table "public.foo" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- a | integer | | | | plain | | b | integer | | | | plain | | Partition key: LIST (a) Number of partitions: 0 In the above case, verbose as well as normal output give information about number of partitions. 2) Add partition to the foo; create table foo_p1 partition of foo for values in (1, 2, 3) partition by list (b); postgres=# \d foo Table "public.foo" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | b | integer | | | Partition key: LIST (a) *Number of partitions: 1 (Use \d+ to list them.)* postgres=# \d+ foo Table "public.foo" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- a | integer | | | | plain | | b | integer | | | | plain | | Partition key: LIST (a) *Partitions: foo_p1 FOR VALUES IN (1, 2, 3) has partitions* Above verbose output for foo says, foo_p1 "has partitions". But if I do postgres=# \d foo_p1 Table "public.foo_p1" Column | Type | Collation | Nullable | Default +-+---+--+- a | integer | | | b | integer | | | Partition of: foo FOR VALUES IN (1, 2, 3) Partition key: LIST (b) *Number of partitions: 0* it tell "Number of partitions: 0". I feel like information is conflicting with each other. AFAIU, idea about adding "has partitions" was to let know that it's a partitioned table. So can you directly add the "is partitioned" in place of "has partitions"? Did those change in the attached patch and update regression expected output. Also run pgindent on the patch. Thanks, On Mon, Sep 4, 2017 at 6:22 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > On Mon, Sep 4, 2017 at 3:48 PM, Alvaro Herrera> wrote: > > > > if (tuples > 0) > > { > > - if (tableinfo.relkind != > RELKIND_PARTITIONED_TABLE) > > - printfPQExpBuffer(, > _("Number of child tables: %d (Use \\d+ to list them.)"), tuples); > > - else > > - printfPQExpBuffer(, > _("Number of partitions: %d (Use \\d+ to list them.)"), tuples); > > + printfPQExpBuffer(, _("Number of %s: > %d (Use \\d+ to list them.)"), ct, tuples); > > printTableAddFooter(, buf.data); > > } > > > > Please don't do this, because it breaks translatability. I think the > > original is fine. > > > > We have used this style in the "else" case of if (!verbose). So, I > just copied it. I have removed that change in the attached patch. > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > ... Rushabh Lathia www.EnterpriseDB.com diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index f6049cc..ab9a637 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -2828,7 +2828,7 @@ describeOneTableDetails(const char *schemaname, /* print child tables (with additional info if partitions) */ if (pset.sversion >= 10) printfPQExpBuffer(, - "SELECT c.oid::pg_catalog.regclass, pg_catalog.pg_get_expr(c.relpartbound, c.oid)" + "SELECT c.oid::pg_catalog.regclass, pg_get_expr(c.relpartbound, c.oid), c.relkind" " FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i" " WHERE c.oid=i.inhrelid AND i.inhparent = '%s'" " ORDER BY c.oid::pg_catalog.regclass::pg_catalog.text;", oid); @@ -2851,7 +2851,18 @@ describeOneTableDetails(const char *schemaname, else tuples = PQntuples(result); - if (!verbose) + /* +
Re: [HACKERS] dropping partitioned tables without CASCADE
On Wed, Sep 6, 2017 at 1:06 PM, Rushabh Lathiawrote: > > 2) Add partition to the foo; > > create table foo_p1 partition of foo for values in (1, 2, 3) partition by > list (b); > > postgres=# \d foo > Table "public.foo" > Column | Type | Collation | Nullable | Default > +-+---+--+- > a | integer | | | > b | integer | | | > Partition key: LIST (a) > Number of partitions: 1 (Use \d+ to list them.) > > postgres=# \d+ foo > Table "public.foo" > Column | Type | Collation | Nullable | Default | Storage | Stats target > | Description > +-+---+--+-+-+--+- > a | integer | | | | plain | > | > b | integer | | | | plain | > | > Partition key: LIST (a) > Partitions: foo_p1 FOR VALUES IN (1, 2, 3) has partitions > > Above verbose output for foo says, foo_p1 "has partitions". But if I do > > postgres=# \d foo_p1 >Table "public.foo_p1" > Column | Type | Collation | Nullable | Default > +-+---+--+- > a | integer | | | > b | integer | | | > Partition of: foo FOR VALUES IN (1, 2, 3) > Partition key: LIST (b) > Number of partitions: 0 > > it tell "Number of partitions: 0". > > I feel like information is conflicting with each other. AFAIU, idea about > adding > "has partitions" was to let know that it's a partitioned table. So can you > directly > add the "is partitioned" in place of "has partitions"? > > Did those change in the attached patch and update regression expected > output. > Looks better. > Also run pgindent on the patch. > Thanks for the changes. The patch looks good to me. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup behavior on non-existent slot
Magnus Hagander wrote: > On Mon, Sep 4, 2017 at 3:21 PM, Jeff Janeswrote: > > Should the parent process of pg_basebackup be made to respond to SIGCHLD? > > Or call waitpid(bgchild, , WNOHANG) in some strategic loop? > > I think it's ok to just call waitpid() -- we don't need to react super > quickly, but we should react. Hmm, not sure about that ... in the normal case (slotname is correct) you'd be doing thousands of useless waitpid() system calls during the whole operation, no? I think it'd be better to have a SIGCHLD handler that sets a flag (just once), which can be quickly checked without accessing kernel space. > And we should then exit the main process with an error before actually > streaming everything. Right. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 【ECPG】strncpy function does not set the end character '\0'
Michael Meskeswrites: >> In the function ECPGnoticeReceiver, we use the stncpy function copy >> the >> sqlstate to sqlca->sqlstate. And the sqlca->sqlstate is defined as >> the size >> of 5, and the copy size is sizeof(sqlca->sqlstate). However, from the >> previous strcmp function, the sqlstate size may be 5,such as >> ECPG_SQLSTATE_INVALID_CURSOR_NAME. So there may be lack of the end >> character >> for sqlca->sqlstate. > Why do you think there should be one? My memory might be wrong but I > don't think it's supposed to be a null terminated string. That field is defined as char[5] in struct sqlca_t, so the intent is clearly that it not be null terminated. However, it looks to me like there'd be at least one alignment-padding byte after it, and that byte is likely to be 0 in a lot of situations (particularly for statically allocated sqlca_t's). So a lot of the time, you could get away with using strcmp() or other functions that expect null termination. I'm thinking therefore that there's probably code out there that tries to do strcmp(sqlca->sqlstate, "22000") or suchlike, and it works often enough that the authors haven't identified their bug. The question is do we want to try to make that be valid code. Changing the field declaration to char[5+1] would be easy enough, but I have no idea how many places in ecpglib would need to change to make sure that the last byte gets set to 0. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] <> join selectivity estimate question
On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munrowrote: > > Thanks. Bearing all that in mind, I ran through a series of test > scenarios and discovered that my handling for JOIN_ANTI was wrong: I > thought that I had to deal with inverting the result, but I now see > that that's handled elsewhere (calc_joinrel_size_estimate() I think). > So neqjoinsel should just treat JOIN_SEMI and JOIN_ANTI exactly the > same way. I agree, esp. after looking at eqjoinsel_semi(), which is used for both semi and anti joins, it becomes more clear. > > That just leaves the question of whether we should try to handle the > empty RHS and single-value RHS cases using statistics. My intuition > is that we shouldn't, but I'll be happy to change my intuition and > code that up if that is the feedback from planner gurus. Empty RHS can result from dummy relations also, which are produced by constraint exclusion, so may be that's an interesting case. Single value RHS may be interesting with partitioned table with all rows in a given partition end up with the same partition key value. But may be those are just different patches. I am not sure. > > Please find attached a new version, and a test script I used, which > shows a bunch of interesting cases. I'll add this to the commitfest. I added some "stable" tests to your patch taking inspiration from the test SQL file. I think those will be stable across machines and runs. Please let me know if those look good to you. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 23e5526..4c1bae6 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -2697,26 +2697,63 @@ neqjoinsel(PG_FUNCTION_ARGS) Oid eqop; float8 result; - /* - * We want 1 - eqjoinsel() where the equality operator is the one - * associated with this != operator, that is, its negator. - */ - eqop = get_negator(operator); - if (eqop) + + if (jointype == JOIN_SEMI || jointype == JOIN_ANTI) { - result = DatumGetFloat8(DirectFunctionCall5(eqjoinsel, - PointerGetDatum(root), - ObjectIdGetDatum(eqop), - PointerGetDatum(args), - Int16GetDatum(jointype), - PointerGetDatum(sjinfo))); + VariableStatData leftvar; + VariableStatData rightvar; + double nullfrac; + bool reversed; + HeapTuple statsTuple; + + get_join_variables(root, args, sjinfo, , , ); + statsTuple = reversed ? rightvar.statsTuple : leftvar.statsTuple; + if (HeapTupleIsValid(statsTuple)) + nullfrac = ((Form_pg_statistic) GETSTRUCT(statsTuple))->stanullfrac; + else + nullfrac = 0.0; + ReleaseVariableStats(leftvar); + ReleaseVariableStats(rightvar); + + /* + * For semi-joins, if there is more than one distinct value in the RHS + * relation then every non-null LHS row must find a row to join since + * it can only be equal to one of them. We'll assume that there is + * always more than one distinct RHS value for the sake of stability, + * though in theory we could have special cases for empty RHS + * (selectivity = 0) and single-distinct-value RHS (selectivity = + * fraction of LHS that has the same value as the single RHS value). + * + * For anti-joins, if we use the same assumption that there is more + * than one distinct key in the RHS relation, then every non-null LHS + * row must be suppressed by the anti-join leaving only nullfrac. + */ + result = 1.0 - nullfrac; } else { - /* Use default selectivity (should we raise an error instead?) */ - result = DEFAULT_EQ_SEL; + /* + * We want 1 - eqjoinsel() where the equality operator is the one + * associated with this != operator, that is, its negator. + */ + eqop = get_negator(operator); + if (eqop) + { + result = DatumGetFloat8(DirectFunctionCall5(eqjoinsel, + PointerGetDatum(root), + ObjectIdGetDatum(eqop), + PointerGetDatum(args), + Int16GetDatum(jointype), + PointerGetDatum(sjinfo))); + } + else + { + /* Use default selectivity (should we raise an error instead?) */ + result = DEFAULT_EQ_SEL; + } + result = 1.0 - result; } - result = 1.0 - result; + PG_RETURN_FLOAT8(result); } diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 9f4c88d..10bfb68 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -1845,6 +1845,89 @@ SELECT '' AS "xxx", * | 1 | 4 | one | -1 (1 row) +-- SEMI and ANTI join neq selectivity +ANALYZE J1_TBL; +ANALYZE J2_TBL; +EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF) +SELECT * FROM J1_TBL t1 +WHERE EXISTS (SELECT * FROM J2_TBL t2 WHERE t1.i <> t2.i); +QUERY PLAN +--- + Nested Loop Semi Join (actual rows=9
Re: [HACKERS] dropping partitioned tables without CASCADE
On 2017/09/06 18:46, Rushabh Lathia wrote: > Okay, I have marked this as ready for committer. Thanks Ashutosh and Rushabh for rebasing and improving the patch. Looks good to me too. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE
On Fri, Jun 2, 2017 at 6:31 PM, Amit Kapilawrote: > > Your reasoning sounds sensible to me. I think the other way to attack > this problem is that we can maintain some local queue in each of the > workers when the shared memory queue becomes full. Basically, we can > extend your "Faster processing at Gather node" patch [1] such that > instead of fixed sized local queue, we can extend it when the shm > queue become full. I think that way we can handle both the problems > (worker won't stall if shm queues are full and workers can do batched > writes in shm queue to avoid the shm queue communication overhead) in > a similar way. > > > [1] - > https://www.postgresql.org/message-id/CAOGQiiMwhOd5-iKZnizn%2BEdzZmB0bc3xa6rKXQgvhbnQ29zCJg%40mail.gmail.com > I worked on this idea of using local queue as a temporary buffer to write the tuples when master is busy and shared queue is full, and it gives quite some improvement in the query performance. Design: On a basic level, the design of this patch can be explained as following, similar to shm_mq, there is a new structure local_mq which is private for each worker. Once shared queue is full, we write the tuple in local queue. Since, local queue is never shared we do not need any sort of locking for writing in it, hence writing in local queue is one cheap operation. Once local queue is atleast 5% (for this version, I've kept this, but we might need to modify it) full we copy the data from local to shared queue. In case both the queues are full, wait till master reads from shared queue, then copy some data from local to shared queue, till required space is available, subsequently write the tuple to local queue. If at any instant local queue becomes empty then we write the tuple in shared queue itself, provided there is space. At the time of worker shutdown we copy all the data from local queue to shared queue. For this version of the patch I have kept the size of local queue = 100 * PARALLEL_TUPLE_QUEUE_SIZE = 6553600, which might not be the best and I am open to understand the reasons for modifying it. But it is kept that way for the scenarios where gather/gather-merge node is slow. And I expect when a master is busy it might be for some long time or the data to be processed is high and we would not want our worker to wait for some long time. Performance: These experiments are on TPC-H scale factor 20. The patch is giving around 20-30% performance improvement in queries with selectivity something around 20-30%. Head: Default plan explain analyse select * from lineitem where l_extendedprice < 5 and l_orderkey < 1500; QUERY PLAN - Index Scan using idx_lineitem_orderkey on lineitem (cost=0.57..334367.85 rows=10313822 width=129) (actual time=0.057..26389.587 rows=10258702 loops=1) Index Cond: (l_orderkey < 1500) Filter: (l_extendedprice < '5'::numeric) Rows Removed by Filter: 4737888 Planning time: 1.686 ms Execution time: 27402.801 ms (6 rows) Force parallelism plan explain analyse select * from lineitem where l_extendedprice < 5 and l_orderkey < 1500; QUERY PLAN - Gather (cost=0.57..193789.78 rows=10313822 width=129) (actual time=0.354..41153.916 rows=10258702 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Index Scan using idx_lineitem_orderkey on lineitem (cost=0.57..193789.78 rows=2578456 width=129) (actual time=0.062..6530.167 rows=2051740 loops=5) Index Cond: (l_orderkey < 1500) Filter: (l_extendedprice < '5'::numeric) Rows Removed by Filter: 947578 Planning time: 0.383 ms Execution time: 42027.645 ms (9 rows) Patch: Force parallelism plan explain analyse select * from lineitem where l_extendedprice < 5 and l_orderkey < 1500; QUERY PLAN - Gather (cost=0.57..193789.78 rows=10313822 width=129) (actual time=0.413..16690.294 rows=10258702 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Index Scan using idx_lineitem_orderkey on lineitem (cost=0.57..193789.78 rows=2578456 width=129) (actual time=0.047..6185.527 rows=2051740 loops=5) Index Cond: (l_orderkey < 1500) Filter: (l_extendedprice < '5'::numeric) Rows Removed by Filter: 947578 Planning time: 0.406 ms Execution time: 17616.750 ms (9 rows) Head: Default plan explain analyse select * from lineitem where l_extendedprice < 5 and l_orderkey < 3000; QUERY PLAN
Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel
On Sat, Jul 22, 2017 at 12:05 AM, Alexey Chernyshovwrote: > the following patch transfers functionality from gevel module > (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for > analyzing GIN and GiST indexes to pageinspect. Gevel was originally > designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and > GIN indexes. > > Functions added: > - gist_stat(text) - shows statistics on GiST Tree > - gist_tree(text) - shows GiST tree > - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL > - gist_print(text) - prints objects stored in GiST tree > - spgist_stat(text) - shows statistics on SP-GiST > - spgist_print(text) - prints objects stored in index > - gin_value_count() - originally gin_stat(text) - prints estimated counts > for index values > - gin_stats() - originally gin_statpage(text) - shows statistics > - gin_count_estimate(text, tsquery) - shows number of indexed rows matched > query > > Tests also transferred, docs for new functions are added. I run pgindent > over the code, but the result is different from those I expected, so I leave > pgindented one. > The patch is applicable to the commit > 866f4a7c210857aa342bf901558d170325094dde. Hi Alexey, This patch still applies but doesn't build after commits 2cd70845 and c6293249. ginfuncs.c:91:47: error: invalid type argument of ‘->’ (have ‘FormData_pg_attribute’) st->index->rd_att->attrs[st->attnum]->attbyval, ...several similar errors... For example, that expression needs to be changed to: TupleDescAttr(st->index->rd_att, attnum)->attbyval Here is some superficial review since I'm here: +/* + * process_tuples + * Retrieves the number of TIDs in a tuple. + */ +static void +process_tuple(FuncCallContext *funcctx, GinStatState * st, IndexTuple itup) "process_tuples" vs "process_tuple" + /* do no distiguish various null category */ "do not distinguish various null categories" + st->nulls[0] = (st->category == GIN_CAT_NORM_KEY) ? false : true; That's a long way to write (st->category != GIN_CAT_NORM_KEY)! + * We scaned the whole page, so we should take right page "scanned" +/* + * refind_position + * Refinds a previois position, at returns it has correctly + * set offset and buffer is locked. + */ "previous", s/, at returns/. On return/ + memset(st->nulls, false, 2 * sizeof(*st->nulls)); "false" looks strange here where an int is expected. The size expression is weird. I think you should just write: st->nulls[0] = false; st->nulls[1] = false; Investigating the types more closely, I see that 'nulls' is declared like this (in struct GinStatState): + char nulls[2]; Then later you do this: + htuple = heap_form_tuple(funcctx->attinmeta->tupdesc, + st->dvalues, + st->nulls); But that's not OK, heap_form_tuple takes bool *. bool and char are not interchangeable (they may have different sizes on some platforms, among other problems, even though usually they are both 1 byte). So I think you should declare it as "bool nulls[2]". Meanwhile in TypeStorage you have a member "bool *nulls", which you then initialise like so: + st->nulls = (char *) palloc((tupdesc->natts + 2) * sizeof(*st->nulls)); That cast is wrong. +/* + * gin_count_estimate + * Outputs number of indexed rows matched query. It doesn't touch heap at + * all. Maybe "... number of indexed rows matching query."? + if (attnum < 0 || attnum >= st->index->rd_att->natts) + elog(ERROR, "Wrong column's number"); + st->attnum = attnum; Maybe "invalid column number" or "invalid attribute number"? + elog(ERROR, "Column type is not a tsvector"); s/C/c/ (according to convention). + * Prints various stat about index internals. s/stat/stats/ + elog(ERROR, "Relation %s.%s is not an index", s/R/r/ + elog(ERROR, "Index %s.%s has wrong type", s/I/i/ + gin_value_count prints estimated counts for each + indexed values, single-argument function will return result for a first + column of index. For example: "... for each indexed value. The single argument version will return results for the first column of an index. For example:" + gin_count_estimate outputs number of indexed + rows matched query. It doesn't touch heap at all. For example: "... outputs the number of indexed rows matched by a query. ..." + gist_print prints objects stored in + GiST tree, works only if objects in index have + textual representation (type_out functions should be + implemented for the given object type). It's known to work with R-tree + GiST based index. Note, in example below, objects are + of type box. For example: "... prints objects stored in a GiST tree. it works only if the objects in the index have textual representation (type_out functions should be implemented for the given object type). It's known to work with R-tree GiST-based indexes. Note: in the example below, objects are of type box. For example:" +
Re: [HACKERS] additional contrib test suites
On Sat, Aug 12, 2017 at 1:20 PM, Peter Eisentrautwrote: > Here are some small test suites for some contrib modules as well as > pg_archivecleanup that didn't have one previously, as well as one patch > to improve code coverage in a module. > > Will add to commit fest. Testing on different platforms and with > different build configurations would be useful. After applying these patches cleanly on top of 0b554e4e63a4ba4852c01951311713e23acdae02 and running "./configure --enable-tap-tests --with-tcl --with-python --with-perl --with-ldap --with-icu && make && make check-world" I saw this failure: cd . && TESTDIR='/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup' PATH="/home/travis/build/postgresql-cfbot/postgresql/tmp_install/usr/local/pgsql/bin:$PATH" LD_LIBRARY_PATH="/home/travis/build/postgresql-cfbot/postgresql/tmp_install/usr/local/pgsql/lib" PGPORT='65432' PG_REGRESS='/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/regress/pg_regress' /usr/bin/prove -I ../../../src/test/perl/ -I . t/*.pl t/010_pg_archivecleanup.pl .. 1/42 # Failed test 'fails if restart file name is not specified: matches' # at /home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm line 330. # 'pg_archivecleanup: must specify oldest kept WAL file # Try "pg_archivecleanup --help" for more information. # ' # doesn't match '(?^:must specify restartfilename)' # Failed test 'fails with too many parameters: matches' # at /home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm line 330. # 'pg_archivecleanup: too many command-line arguments # Try "pg_archivecleanup --help" for more information. # ' # doesn't match '(?^:too many parameters)' # Failed test 'fails with invalid restart file name: matches' # at /home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm line 330. # 'pg_archivecleanup: invalid file name argument # Try "pg_archivecleanup --help" for more information. # ' # doesn't match '(?^:invalid filename)' # Looks like you failed 3 tests of 42. t/010_pg_archivecleanup.pl .. Dubious, test returned 3 (wstat 768, 0x300) Failed 3/42 subtests Test Summary Report --- t/010_pg_archivecleanup.pl (Wstat: 768 Tests: 42 Failed: 3) Failed tests: 12, 16, 18 Non-zero exit status: 3 Files=1, Tests=42, 0 wallclock secs ( 0.03 usr 0.00 sys + 0.05 cusr 0.00 csys = 0.08 CPU) Result: FAIL -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
On 2017-09-06 07:23, Tom Lane wrote: Jeff Janeswrites: What scale factor and client count? How many cores per socket? It looks like Sokolov was just starting to see gains at 200 clients on 72 cores, using -N transaction. This means that Sokolov's proposed changes in atomics/generic.h ought to be uninteresting for performance on any platform we care about --- at least for pg_atomic_fetch_add_u32(). May I cite you this way: "Tom Lane says PostgreSQL core team doesn't care for 4 socket 72 core Intel Xeon servers" ??? To be honestly, I didn't measure exact impact of changing pg_atomic_fetch_add_u32. I found issue with pg_atomic_fetch_or_u32, cause it is highly contended both in LWLock, and in buffer page state management. I found changing loop for generic pg_atomic_fetch_or_u32 already gives improvement. And I changed loop for other generic atomic functions to make all functions uniform. It is bad style guide not to changing worse (but correct) code into better (and also correct) just because you can't measure difference (in my opinion. Perhaps i am mistaken). (and yes: gcc intrinsic gives more improvement). -- With regards, Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
On 5 September 2017 at 21:23, Tom Lanewrote: > Jeff Janes writes: >> What scale factor and client count? How many cores per socket? It looks >> like Sokolov was just starting to see gains at 200 clients on 72 cores, >> using -N transaction. ... > Moreover, it matters which primitive you're testing, on which platform, > with which compiler, because we have a couple of layers of atomic ops > implementations. ... I think Sokolov was aiming at 4-socket servers specifically, rather than as a general performance gain. If there is no gain on 2-socket, at least there is no loss either. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Copyright in partition.h and partition.c
> On 06 Sep 2017, at 02:56, Amit Langotewrote: > > On 2017/09/05 21:14, Tom Lane wrote: >> Amit Langote writes: >>> On 2017/09/05 15:48, Etsuro Fujita wrote: Here is the copyright in partition.h: * Copyright (c) 2007-2017, PostgreSQL Global Development Group I think it's reasonable that that matches the copyright in partition.c, but partition.c has: * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California Is that intentional? >> >>> No, it's unintentional. The difference may have resulted from copying >>> different files to become partition.h and partition.c, respectively. >> >>> Maybe, we should change both to say 2016-2017? >> >>> I don't know the exact rule for how we determine those years. Is there >>> some rule in place about that? When I look at execParallel.c, which >>> supposedly got introduced into the tree recently, I see 1996-2017. OTOH, >>> the files in contrib/bloom all have 2016-2017. >> >> Our usual practice is to write the copyright like it is in partition.c >> even in new files. This avoids any question about whether any of the >> code was copied-and-pasted from somewhere else in PG. Even if not one >> word in the file can be traced to code that was somewhere else before, >> it seems to me that this is an appropriate thing to do, to give due >> credit to those who came before us. > > Agreed. > >> In short: we should make partition.h's copyright look like partition.c's >> not vice versa. > > Attached patch does that. This reminded me that I’d seen one of these before while hacking, and with some grep and xargs abuse I spotted one more (there might be more that my command line fu didn’t catch though). Attached could perhaps be included with the above patch? Perhaps the copyright script should be expanded to catch these? (and I volunteer to attempt that unless it’s deemed an uninteresting feature) cheers ./daniel header_copyright.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for Default partition in partitioning
Hi Ashutosh, I have tried to address your comments in V27 patch series[1]. Please find my comments inlined. > >> > >> The current set of patches contains 6 patches as below: > >> > >> 0001: > >> Refactoring existing ATExecAttachPartition code so that it can be used > >> for > >> default partitioning as well > If I understand correctly these comments refer to patch 0001 of V25_rebase series, which is related to "Fix assumptions that get_qual_from_partbound()" and not refactoring. This is patch 0001 in V27 now. * Returns an expression tree describing the passed-in relation's partition > - * constraint. > + * constraint. If there are no partition constraints returns NULL e.g. in > case > + * default partition is the only partition. > The first sentence uses singular constraint. The second uses plural. Given > that > partition bounds together form a single constraint we should use singular > constraint in the second sentence as well. > I have changed the wording now. > > Do we want to add a similar comment in the prologue of > generate_partition_qual(). The current wording there seems to cover this > case, > but do we want to explicitly mention this case? > I have added a comment there. > > +if (!and_args) > +result = NULL; > While this is correct, I am increasingly seeing (and_args != NIL) usage. > Changed this to: + if (and_args == NIL) + result = NULL; > get_partition_qual_relid() is called from pg_get_partition_ > constraintdef(), > constr_expr = get_partition_qual_relid(relationId); > > /* Quick exit if not a partition */ > if (constr_expr == NULL) > PG_RETURN_NULL(); > The comment is now wrong since a default partition may have no > constraints. May > be rewrite it as simply, "Quick exit if no partition constraint." > Fixed. > > generate_partition_qual() has three callers and all of them are capable of > handling NIL partition constraint for default partition. May be it's > better to > mention in the commit message that we have checked that the callers of > this function > can handle NIL partition constraint. > Added in commit message. > >> > >> 0002: > >> This patch teaches the partitioning code to handle the NIL returned by > >> get_qual_for_list(). > >> This is needed because a default partition will not have any constraints > >> in case > >> it is the only partition of its parent. > Comments below refer to patch 0002 in V25_rebase(0003 in V25), which adds basic support for default partition, which is now 0002 in V27. > If the partition being dropped is the default partition, > heap_drop_with_catalog() locks default partition twice, once as the default > partition and the second time as the partition being dropped. So, it will > be > counted as locked twice. There doesn't seem to be any harm in this, since > the > lock will be help till the transaction ends, by when all the locks will be > released. > Fixed. > Same is the case with cache invalidation message. If we are dropping > default > partition, the cache invalidation message on "default partition" is not > required. Again this might be harmless, but better to avoid it. Fixed. > Similar problems exists with ATExecDetachPartition(), default partition > will be > locked twice if it's being detached. > In ATExecDetachPartition() we do not have OID of the relation being detached available at the time we lock default partition. Moreover, here we are taking an exclusive lock on default OID and an share lock on partition being detached. As you correctly said in your earlier comment that it will be counted as locked twice, which to me also seems harmless. As these locks are to be held till commit of the transaction nobody else is supposed to be releasing these locks in between. I am not able to visualize a problem here, but still I have tried to avoid locking the default partition table twice, please review the changes and let me know your thoughts. > +/* > + * If this is a default partition, pg_partitioned_table must have > it's > + * OID as value of 'partdefid' for it's parent (i.e. rel) entry. > + */ > +if (castNode(PartitionBoundSpec, boundspec)->is_default) > +{ > +Oidpartdefid; > + > +partdefid = get_default_partition_oid(RelationGetRelid(rel)); > +Assert(partdefid == inhrelid); > +} > Since an accidental change or database corruption may change the default > partition OID in pg_partition_table. An Assert won't help in such a case. > May > be we should throw an error or at least report a warning. If we throw an > error, > the table will become useless (or even the database will become useless > RelationBuildPartitionDesc is called from RelationCacheInitializePhase3() > on > such a corrupted table). To avoid that we may raise a warning. > > I have added a warning here instead of Assert. > I am wondering whether we could avoid call to
Re: [HACKERS] Fix performance of generic atomics
On 2017-09-06 14:54, Sokolov Yura wrote: On 2017-09-06 07:23, Tom Lane wrote: Jeff Janeswrites: What scale factor and client count? How many cores per socket? It looks like Sokolov was just starting to see gains at 200 clients on 72 cores, using -N transaction. This means that Sokolov's proposed changes in atomics/generic.h ought to be uninteresting for performance on any platform we care about --- at least for pg_atomic_fetch_add_u32(). May I cite you this way: I apologize for tone of my previous message. My tongue is my enemy. -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 2017/09/04 10:10, Amit Langote wrote: > On 2017/09/02 2:52, Robert Haas wrote: >> It strikes me that this patch set is doing two things but maybe in the >> opposite order that I would have chosen to attack them. First, >> there's getting partition pruning to use something other than >> constraint exclusion. Second, there's deferring work that is >> currently done at an early stage of the process until later, so that >> we waste less effort on partitions that are ultimately going to be >> pruned. > > OK. > >> Therefore, IMHO, it would be best to focus first on how we're going to >> identify the partitions that survive pruning, and then afterwards work >> on transposing that logic to happen before partitions are opened and >> locked. That way, we get some incremental benefit sooner, and also >> unblock some other development work. > > Alright, I will try to do it that way. Attached set of patches that does things that way. Individual patches described below: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner This will allow us to perform scan and join planning in a per partition sub-tree manner, with each sub-tree's root getting its own RelOptInfo. Previously, only the root of the whole partition tree would get a RelOptInfo, along with the leaf partitions, with each leaf partition's AppendRelInfo pointing to the former as its parent. This is essential, because we will be doing partition-pruning for every partitioned table in the tree by matching query's scan keys with its partition key. We won't be able to do that if the intermediate partitioned tables didn't have a RelOptInfo. [PATCH 2/5] WIP: planner-side changes for partition-pruning This patch adds a stub get_partitions_for_keys in partition.c with a suitable interface for the caller to pass bounding keys extracted from the query and other related information. Importantly, it implements the logic within the planner to match query's scan keys to a parent table's partition key and form the bounding keys that will be passed to partition.c to compute the list of partitions that satisfy those bounds. Also, it adds certain fields to RelOptInfos of the partitioned tables that reflect its partitioning properties. [PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch} This guy modifies the partition bound comparison function so that the caller can pass incomplete partition key tuple that is potentially a prefix of a multi-column partition key. Currently, the input tuple must contain all of key->partnatts values, but that may not be true for queries, which may not have restrictions on all the partition key columns. [PATCH 4/5] WIP: Implement get_partitions_for_keys() This one fills the get_partitions_for_keys stub with the actual logic that searches the partdesc->boundinfo for partition bounds that match the bounding keys specified by the caller. [PATCH 5/5] Add more tests for the new partitioning-related planning More tests. Some TODO items still remain to be done: * Process OR clauses to use for partition-pruning * Process redundant clauses (such as a = 10 and a > 1) more smartly * Other tricks that are missing * Fix bugs * More tests Thanks, Amit From 17dfaff62fe04cf18f5bba298974d42f92b597ef Mon Sep 17 00:00:00 2001 From: amitDate: Wed, 6 Sep 2017 09:28:14 +0900 Subject: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner ...except when the partitioned table in question is the result rel of the query. This allows us perform scan and join planning for each sub-tree in a given partition tree, with each sub-tree's root partitioned table getting its own RelOptInfo. Previously only the root of the whole partition tree got a RelOptInfo, along with the leaf partitions, with each leaf partition's AppendRelInfo pointing to the former as its parent. --- src/backend/optimizer/path/allpaths.c | 34 ++- src/backend/optimizer/plan/planner.c | 3 +- src/backend/optimizer/prep/prepunion.c | 166 + src/backend/optimizer/util/plancat.c | 7 +- 4 files changed, 146 insertions(+), 64 deletions(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 2d7e1d84d0..6c3511bd47 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -1289,11 +1289,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte; rte = planner_rt_fetch(rel->relid, root); + + /* +* Get the partitioned_rels list from root->pcinfo_list after +* confirming that rel is actually a root partitioned table. +*/ if (rte->relkind == RELKIND_PARTITIONED_TABLE) { - partitioned_rels = get_partitioned_child_rels(root, rel->relid); - /* The root partitioned table is included as a child rel */ - Assert(list_length(partitioned_rels) >= 1); +
Re: [HACKERS] document and use SPI_result_code_string()
Michael Paquierwrites: > Fine for 0002. This reminds me of LockGXact and RemoveGXact in > twophase.c, as well as _hash_squeezebucket that have some code paths > that cannot return... Any thoughts about having some kind of > PG_NOTREACHED defined to 0 which could be put in an assertion? Generally we just do "Assert(false)", maybe with "not reached" in a comment. I don't feel a strong need to invent a new way to do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] <> join selectivity estimate question
Simon Riggswrites: > Why isn't this an open item for PG10? Why should it be? This behavior has existed for a long time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
Sokolov Yurawrites: > On 2017-09-06 15:56, Tom Lane wrote: >> The point I'm trying to make is that if tweaking generic.h improves >> performance then it's an indicator of missed cases in the less-generic >> atomics code, and the latter is where our attention should be focused. >> I think basically all of the improvement Sokolov got was from upgrading >> the coverage of generic-gcc.h. > Not exactly. I've checked, that new version of generic > pg_atomic_fetch_or_u32 > loop also gives improvement. But once you put in the generic-gcc version, that's not reached anymore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] <> join selectivity estimate question
On 6 September 2017 at 04:14, Ashutosh Bapatwrote: > On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro > wrote: >> >> Thanks. Bearing all that in mind, I ran through a series of test >> scenarios and discovered that my handling for JOIN_ANTI was wrong: I >> thought that I had to deal with inverting the result, but I now see >> that that's handled elsewhere (calc_joinrel_size_estimate() I think). >> So neqjoinsel should just treat JOIN_SEMI and JOIN_ANTI exactly the >> same way. > > I agree, esp. after looking at eqjoinsel_semi(), which is used for > both semi and anti joins, it becomes more clear. > >> >> That just leaves the question of whether we should try to handle the >> empty RHS and single-value RHS cases using statistics. My intuition >> is that we shouldn't, but I'll be happy to change my intuition and >> code that up if that is the feedback from planner gurus. > > Empty RHS can result from dummy relations also, which are produced by > constraint exclusion, so may be that's an interesting case. Single > value RHS may be interesting with partitioned table with all rows in a > given partition end up with the same partition key value. But may be > those are just different patches. I am not sure. > >> >> Please find attached a new version, and a test script I used, which >> shows a bunch of interesting cases. I'll add this to the commitfest. > > I added some "stable" tests to your patch taking inspiration from the > test SQL file. I think those will be stable across machines and runs. > Please let me know if those look good to you. Why isn't this an open item for PG10? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
Simon Riggswrites: > On 5 September 2017 at 21:23, Tom Lane wrote: >> Moreover, it matters which primitive you're testing, on which platform, >> with which compiler, because we have a couple of layers of atomic ops >> implementations. > If there is no gain on 2-socket, at least there is no loss either. The point I'm trying to make is that if tweaking generic.h improves performance then it's an indicator of missed cases in the less-generic atomics code, and the latter is where our attention should be focused. I think basically all of the improvement Sokolov got was from upgrading the coverage of generic-gcc.h. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
On 2017-09-06 15:56, Tom Lane wrote: Simon Riggswrites: On 5 September 2017 at 21:23, Tom Lane wrote: Moreover, it matters which primitive you're testing, on which platform, with which compiler, because we have a couple of layers of atomic ops implementations. If there is no gain on 2-socket, at least there is no loss either. The point I'm trying to make is that if tweaking generic.h improves performance then it's an indicator of missed cases in the less-generic atomics code, and the latter is where our attention should be focused. I think basically all of the improvement Sokolov got was from upgrading the coverage of generic-gcc.h. regards, tom lane Not exactly. I've checked, that new version of generic pg_atomic_fetch_or_u32 loop also gives improvement. Without that check I'd not suggest to fix generic atomic functions. Of course, gcc intrinsic gives more gain. -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fix bloom WAL tap test
Hi! I just realized that these lines of contrib/bloom/t/001_wal.pl don't check that queries give same results on master and standby. They just check that *return codes* of psql are equal. # Run test queries and compare their result > my $master_result = $node_master->psql("postgres", $queries); > my $standby_result = $node_standby->psql("postgres", $queries); > is($master_result, $standby_result, "$test_name: query result matches"); Attached patch fixes this problem by using safe_psql() which returns psql output instead of return code. For safety, this patch replaces psql() with safe_psql() in other places too. I think this should be backpatched to 9.6 as bugfix. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company fix-bloom-wal-check.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.
Simon Riggswrites: > Based upon input from Tom and Fabien, I propose this additional doc patch. I do not think any of this is appropriate, particularly not the reference to 7.0.3. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is anything preventing us from allowing write to foreign tables from standby?
Hi! We're currently blocking writing queries on standby if even they are modifying contents of foreign tables. But do we have serious reasons for that? Keeping in the mind FDW-sharding, making FDW-tables writable from standby would be good to prevent single-master bottleneck. I wrote simple patch enabling writing to foreign tables from standbys. It works from the first glance for me. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company standby-fdw-writable.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Walsender timeouts and large transactions
I've changed to "need review" to gain more attention from other. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
Peter Eisentraut wrote: > I propose splitting the single Perl script into three separate test > files: one for basic command-line option handling and so on (I would > like to expand that later), one for the main upgrade test, and one for > the funny database names tests. Check. > We also need to have a plan for handling the build farm. Maybe keep the > vcregress.pl upgradecheck target as a thin wrapper for the time being? The buildfarm already runs "make check" in src/bin/ when TAP tests are enabled, which should be enough to trigger the rewritten test, no? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
On 4/14/17 02:00, Michael Paquier wrote: > Attached is an updated patch to use --no-sync with pg_dumpall calls. Please send a rebased patch. I propose splitting the single Perl script into three separate test files: one for basic command-line option handling and so on (I would like to expand that later), one for the main upgrade test, and one for the funny database names tests. In the testing file, you removed the paragraph that explains how to do cross-version upgrade testing. It's unfortunate that we would lose that functionality. What can we do about that? We also need to have a plan for handling the build farm. Maybe keep the vcregress.pl upgradecheck target as a thin wrapper for the time being? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.
On 5 September 2017 at 11:58, Fabien COELHOwrote: > > Hello Simon, > >> Does raise the further question of how psql behaves when we connect to >> a pre-10 server, so we have SERVER_VERSION_NUM but yet it is not set. >> How does this >> \if SERVER_VERSION_NUM < x > > > The if does not have expressions (yet), it just handles TRUE/ON/1 and > FALSE/0/OFF. > >> Do we need some macro or suggested special handling? > > > If "SERVER_VERSION_NUM" is available in 10, then: > > -- exit if version < 10 (\if is ignored and \q is executed) > \if false \echo "prior 10" \q \endif > > -- then test version through a server side expression, will work > SELECT :SERVER_VERSION_NUM < 11 AS prior_11 \gset > \if :prior_11 > -- version 10 > \else > -- version 11 or more > \endif Based upon input from Tom and Fabien, I propose this additional doc patch. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services clarify_psql_server_version.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
On 2017-09-06 16:36, Tom Lane wrote: Sokolov Yurawrites: On 2017-09-06 15:56, Tom Lane wrote: The point I'm trying to make is that if tweaking generic.h improves performance then it's an indicator of missed cases in the less-generic atomics code, and the latter is where our attention should be focused. I think basically all of the improvement Sokolov got was from upgrading the coverage of generic-gcc.h. Not exactly. I've checked, that new version of generic pg_atomic_fetch_or_u32 loop also gives improvement. But once you put in the generic-gcc version, that's not reached anymore. Yes, you're right. But I think, generic version still should be "fixed". If generic version is not reached on any platform, then why it is kept? If it is reached somewhere, then it should be improved. -- Sokolov Yura aka funny_falcon Postgres Professional: https://postgrespro.ru The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
On Tue, Sep 5, 2017 at 1:38 PM, Tom Lanewrote: > The complaint I have about PRAGMA is that it's yet another syntax for > accomplishing pretty much the same thing. If you don't like the GUC > solution, we've already got the "comp_option" syntax for static options > in plpgsql. Sure, that's not too pretty, but that's not a good reason > to invent yet another way to do it. On the general question of whether we should have something like this, I expressed a lot of doubt when e6faf910d75027bdce7cd0f2033db4e912592bcc first went in about whether that algorithm was really going to work, and nothing has happened since then to remove any of that doubt. It's pretty clear to me from experiences with customer problems that the heuristics we have often fail to do the right thing, either because queries with no hope of benefiting still replan 5 times - which can waste a ton of CPU when there are many different queries in the plan cache and many sessions - or because queries that would benefit in some cases give up on replanning before they hit a case where a parameter-specific plan helps. I don't think we can just indefinitely continue to resist providing manual control over this behavior on the theory that some day we'll fix it. It's been six years and we haven't made any significant progress. In some cases, a long delay without any progress might just point to a lack of effort that should have been applied, but in this case I think it's because the problem is incredibly hard. I think what we ideally want to do is notice whether the new bind variables cause a change in selectivity which is sufficient to justify a re-plan. If we annotated the original plan with markers indicating that it was valid for all values with a frequency of more than X and less than Y, for example, we could cover most cases involving equality; range queries would need some other kind of annotation. However, it's unclear how the planner could produce such annotations, and it's unclear how expensive checking against them would be if we had them. Barring somebody having a brilliant insight about how to make some system that's way better than what we have right now, I think we can't hold out much hope of any better fix than a manual knob. I think a GUC is a decent, though not perfect, mechanism for this. This problem isn't restricted to PL/pgsql; indeed, the cases I've seen have come via prepared queries, not PL/pgsql functions. Even without that, one advantage of a GUC is that they are fairly broadly understood and experienced users understand what they can do with them. They can be set at various different scopes (system, user, database, SET clause for a particular function) and it's relatively convenient to do so. Some kind of PL/pgsql-specific PRAGMA syntax is more likely to be overlooked by users who would actually benefit from it, and also won't cover non-PL/pgsql cases. If we were going to go the PRAGMA route, it would make more sense to me to define that as a way of setting a GUC for the scope of one PL/pgsql block, like PRAGMA SETTING(custom_plan_tries, 0). I think it is in general unfortunate that we don't have a mechanism to change a GUC for the lifespan of one particular query, like this: LET custom_plan_tries = 0 IN SELECT ... I bet a lot of people would find that quite convenient. The problem of needing to set a planner GUC for one particular query is pretty common, and Pavel is absolutely right that having to do SET beforehand and RESET afterward is ugly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix bloom WAL tap test
On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov < a.korot...@postgrespro.ru> wrote: > I just realized that these lines of contrib/bloom/t/001_wal.pl don't > check that queries give same results on master and standby. They just > check that *return codes* of psql are equal. > > # Run test queries and compare their result >> my $master_result = $node_master->psql("postgres", $queries); >> my $standby_result = $node_standby->psql("postgres", $queries); >> is($master_result, $standby_result, "$test_name: query result matches"); > > > Attached patch fixes this problem by using safe_psql() which returns psql > output instead of return code. For safety, this patch replaces psql() with > safe_psql() in other places too. > > I think this should be backpatched to 9.6 as bugfix. > Also, it would be nice to call wal-check on bloom check (now wal-check isn't called even on check-world). See attached patch. My changes to Makefile could be cumbersome. Sorry for that, I don't have much experience with them... -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company wal-check-on-bloom-check.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
Alvaro Herrerawrites: > Peter Eisentraut wrote: >> We also need to have a plan for handling the build farm. Maybe keep the >> vcregress.pl upgradecheck target as a thin wrapper for the time being? > The buildfarm already runs "make check" in src/bin/ when TAP tests are > enabled, which should be enough to trigger the rewritten test, no? I think Peter's on about the Windows case. Not sure how that's handled, but it's not "make check". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
Robert Haaswrites: > I don't think we can just indefinitely continue to resist > providing manual control over this behavior on the theory that some > day we'll fix it. That's fair enough. We need to have a discussion about exactly what the knob does, which is distinct from the question of how you spell the incantation for twiddling it. I'm dubious that a dumb "force a custom plan" setting is going to solve all that many cases usefully. > I think a GUC is a decent, though not perfect, mechanism for this. > This problem isn't restricted to PL/pgsql; indeed, the cases I've seen > have come via prepared queries, not PL/pgsql functions. That's 100% correct, and is actually the best reason not to consider a PRAGMA (or any other plpgsql-specific mechanism) as the incantation spelling. > I think it is in general unfortunate that we don't have a mechanism to > change a GUC for the lifespan of one particular query, like this: > LET custom_plan_tries = 0 IN SELECT ... Hmm. I think the core problem here is that we're trying to control the plancache, which is a pretty much behind-the-scenes mechanism. Except in the case of an explicit PREPARE, you can't even see from SQL that the cache is being used, or when it's used. So part of what needs to be thought about, if we use the GUC approach, is when the GUC's value is consulted. If we don't do anything special then the GUC(s) would be consulted when retrieving plans from the cache, and changes in their values from one retrieval to the next might cause funny behavior. Maybe the relevant settings need to be captured when the plancache entry is made ... not sure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan
On Wed, Sep 6, 2017 at 11:03 AM, Tom Lanewrote: > That's fair enough. We need to have a discussion about exactly what > the knob does, which is distinct from the question of how you spell > the incantation for twiddling it. I'm dubious that a dumb "force a > custom plan" setting is going to solve all that many cases usefully. I think what people need is the ability to force the behavior in either direction - either insist on a custom plan, or insist on a generic plan. The former is what you need if the plan stinks, and the latter is what you need if replanning is a waste of effort. I have seen both cases. The latter has been a bigger problem than the former, because the former can be hacked around in various ugly and inefficient ways, but if we're adding a knob I think it should have three settings. There is perhaps an argument for even more configurability, like altering the number of plan tries from 5 to some other value, but I'm not clear that there's any use case for values other than 0, 5, and +infinity. >> I think it is in general unfortunate that we don't have a mechanism to >> change a GUC for the lifespan of one particular query, like this: > >> LET custom_plan_tries = 0 IN SELECT ... > > Hmm. I think the core problem here is that we're trying to control > the plancache, which is a pretty much behind-the-scenes mechanism. > Except in the case of an explicit PREPARE, you can't even see from > SQL that the cache is being used, or when it's used. So part of what > needs to be thought about, if we use the GUC approach, is when the > GUC's value is consulted. If we don't do anything special then > the GUC(s) would be consulted when retrieving plans from the cache, > and changes in their values from one retrieval to the next might > cause funny behavior. Maybe the relevant settings need to be captured > when the plancache entry is made ... not sure. What sort of funny behavior are you concerned about? It seems likely to me that in most cases the GUC will have the same value every time through, but if it doesn't, I'm not sure why we'd need to use the old value rather than the current one. Indeed, if the user changes the GUC from "force custom" to "force generic" and reruns the function, we want the new value to take effect, lest a POLA violation occur. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
On 2017-09-06 15:25:20 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2017-09-06 15:12:13 -0400, Tom Lane wrote: > >> It looks to me like two of the three implementations promise no such > >> thing. > > > They're volatile vars, so why not? > > Yeah, but so are the caller's variables. That is, in > > pg_atomic_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 xchg_) > { > uint64 old; > old = ptr->value; > > ISTM that the compiler is required to actually fetch ptr->value, not > rely on some previous read of it. I do not think that (the first > version of) pg_atomic_read_u64_impl is adding any guarantee that wasn't > there already. > > >> Even if they somehow do, it hardly matters given that the cmpxchg loop > >> would be self-correcting. > > > Well, in this one instance maybe, hardly in others. > > All the functions involved use nigh-identical cmpxchg loops. > > > What are you suggesting as an alternative? > > I think we can just use "old = ptr->value" to set up for the cmpxchg > loop in every generic.h function that uses such a loop. I think we might have been talking past each other - I thought you were talking about changing the pg_atomic_read_u64_impl implementation for external users. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Sep 6, 2017 at 3:18 PM, Tom Lanewrote: > Robert Haas writes: >> On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane wrote: >>> If somebody's applying apply_projection_to_path to a path that's already >>> been add_path'd, that's a violation of the documented restriction. > >> /me is confused. Isn't that exactly what grouping_planner() is doing, >> and has done ever since your original pathification commit >> (3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)? It's iterating over >> current_rel->pathlist, so surely everything in there has been >> add_path()'d. > > I think the assumption there is that we no longer care about validity of > the input Relation, since we won't be looking at it any more (and > certainly not adding more paths to it). If there's some reason why > that's not true, then maybe grouping_planner has a bug there. Right, that's sorta what I assumed. But I think that thinking is flawed in the face of parallel query, because of the fact that apply_projection_to_path() pushes down target list projection below Gather when possible. In particular, as Jeff and Amit point out, it may well be that (a) before apply_projection_to_path(), the cheapest plan is non-parallel and (b) after apply_projection_to_path(), the cheapest plan would be a Gather plan, except that it's too late because we've already thrown that path out. What we ought to do, I think, is avoid generating gather paths until after we've applied the target list (and the associated costing changes) to both the regular path list and the partial path list. Then the cost comparison is apples-to-apples. The use of apply_projection_to_path() on every path in the pathlist would be fine if it were adjusting all the costs by a uniform amount, but it isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
I wrote: > Ah. I was not thinking of touching pg_atomic_read_u32/u64_impl, > although now that you mention it, it's not clear to me why we > couldn't simplify > - return *(>value); > + return ptr->value; Just to check, I applied that change to pg_atomic_read_u32_impl and pg_atomic_read_u64_impl, and recompiled. I get bit-for-bit the same backend executable. Maybe it would have an effect on some other compiler, but I doubt it, except perhaps at -O0. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
Robert Haaswrites: > On Wed, Sep 6, 2017 at 3:41 PM, Tom Lane wrote: >> I'm not entirely following. I thought that add_path was set up to treat >> "can be parallelized" as an independent dimension of merit, so that >> parallel paths would always survive. > Here, the Gather path is not parallel-safe, but rather > parallel-restricted: Ah, right, the problem is with the Gather not its sub-paths. >> Might be a tad messy to rearrange things that way. > Why do you think I wanted you to do it? :-) I'll think about it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
Amit Langotewrites: > On 2017/08/22 9:39, Michael Paquier wrote: >> On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote >> wrote: >>> I updated brin_mask() and spg_mask() in the attached updated patches so >>> that they consider meta pages as containing unused space. I looked briefly at these patches. I'm not sure that it's safe for the mask functions to assume that meta pages always have valid pd_lower. What happens when replaying log data concerning an old index that doesn't have that field filled? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
On 2017-09-06 17:36:02 +0900, Kyotaro HORIGUCHI wrote: > Hi, > > At Mon, 4 Sep 2017 17:17:19 +0900, Michael Paquier >wrote in >
Re: [HACKERS] Fix performance of generic atomics
Andres Freundwrites: > On 2017-09-06 15:25:20 -0400, Tom Lane wrote: >> I think we can just use "old = ptr->value" to set up for the cmpxchg >> loop in every generic.h function that uses such a loop. > I think we might have been talking past each other - I thought you were > talking about changing the pg_atomic_read_u64_impl implementation for > external users. Ah. I was not thinking of touching pg_atomic_read_u32/u64_impl, although now that you mention it, it's not clear to me why we couldn't simplify - return *(>value); + return ptr->value; AFAIK, the compiler is entitled to, and does, simplify away that take-a-pointer-and-immediately-dereference-it dance. If it did not, a whole lot of standard array locutions would be much less efficient than they should be. What matters here is the volatile qualifier, which we've already got. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
Robert Haaswrites: > In particular, as Jeff and Amit point out, it > may well be that (a) before apply_projection_to_path(), the cheapest > plan is non-parallel and (b) after apply_projection_to_path(), the > cheapest plan would be a Gather plan, except that it's too late > because we've already thrown that path out. I'm not entirely following. I thought that add_path was set up to treat "can be parallelized" as an independent dimension of merit, so that parallel paths would always survive. > What we ought to do, I think, is avoid generating gather paths until > after we've applied the target list (and the associated costing > changes) to both the regular path list and the partial path list. Might be a tad messy to rearrange things that way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
Andres Freundwrites: > On 2017-09-06 15:12:13 -0400, Tom Lane wrote: >> It looks to me like two of the three implementations promise no such >> thing. > They're volatile vars, so why not? Yeah, but so are the caller's variables. That is, in pg_atomic_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 xchg_) { uint64 old; old = ptr->value; ISTM that the compiler is required to actually fetch ptr->value, not rely on some previous read of it. I do not think that (the first version of) pg_atomic_read_u64_impl is adding any guarantee that wasn't there already. >> Even if they somehow do, it hardly matters given that the cmpxchg loop >> would be self-correcting. > Well, in this one instance maybe, hardly in others. All the functions involved use nigh-identical cmpxchg loops. > What are you suggesting as an alternative? I think we can just use "old = ptr->value" to set up for the cmpxchg loop in every generic.h function that uses such a loop. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Sep 6, 2017 at 3:41 PM, Tom Lanewrote: > Robert Haas writes: >> In particular, as Jeff and Amit point out, it >> may well be that (a) before apply_projection_to_path(), the cheapest >> plan is non-parallel and (b) after apply_projection_to_path(), the >> cheapest plan would be a Gather plan, except that it's too late >> because we've already thrown that path out. > > I'm not entirely following. I thought that add_path was set up to treat > "can be parallelized" as an independent dimension of merit, so that > parallel paths would always survive. It treats parallel-safety as an independent dimension of merit; a parallel-safe plan is more meritorious than one of equal cost which is not. We need that so that because, for example, forming a partial path for a join means joining a partial path to a parallel-safe path. But that doesn't help us here; that's to make sure we can build the necessary stuff *below* the Gather. IOW, if we threw away parallel-safe paths because there was a cheaper parallel-restricted path, we might be unable to build a partial path for the join *at all*. Here, the Gather path is not parallel-safe, but rather parallel-restricted: it's OK for it to exist in a plan that uses parallelism (duh), but it can't be nested under another Gather (also duh, kinda). So before accounting for the differing projection cost, the Gather path is doubly inferior: it is more expensive AND not parallel-safe, whereas the competing non-parallel plan is both cheaper AND parallel-safe. After applying the expensive target list, the parallel-safe plan gets a lot more expensive, but the Gather path gets more expensive to a lesser degree because the projection step ends up below the Gather and thus happens in parallel, so now the Gather plan, still a loser on parallel-safety, is a winner on total cost and thus ought to have been retained and, in fact, ought to have won. Instead, we threw it out too early. >> What we ought to do, I think, is avoid generating gather paths until >> after we've applied the target list (and the associated costing >> changes) to both the regular path list and the partial path list. > > Might be a tad messy to rearrange things that way. Why do you think I wanted you to do it? :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapilawrote: > Okay, now I understand your point, but I think we already change the > cost of paths in apply_projection_to_path which is done after add_path > for top level scan/join paths. Yeah. I think that's a nasty hack, and I think it's Tom's fault. :-) It's used in various places with comments like this: /* * The path might not return exactly what we want, so fix that. (We * assume that this won't change any conclusions about which was the * cheapest path.) */ And in another place: * In principle we should re-run set_cheapest() here to identify the * cheapest path, but it seems unlikely that adding the same tlist * eval costs to all the paths would change that, so we don't bother. I think these assumptions were a little shaky even before parallel query came along, but they're now outright false, because we're not adding the *same* tlist eval costs to all paths any more. The parallel paths are getting smaller costs. That probably doesn't matter much if the expressions in questions are things like a + b, but when as in Jeff's example it's slow(a), then it matters a lot. I'd feel a lot happier if Tom were to decide how this ought to be fixed, because - in spite of some modifications by various parallel query code - this is basically all his design and mostly his code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
Hi, On 2017-09-06 14:31:26 -0400, Tom Lane wrote: > I wrote: > > Anyway, I don't have a big objection to applying this. My concern > > is more that we need to be taking a harder look at other parts of > > the atomics infrastructure, because tweaks there are likely to buy > > much more. > > I went ahead and pushed these patches, adding __sync_fetch_and_sub > since gcc seems to provide that on the same footing as these other > functions. > > Looking at these generic functions a bit closer, I notice that > most of them are coded like > > old = pg_atomic_read_u32_impl(ptr); > while (!pg_atomic_compare_exchange_u32_impl(ptr, , old | or_)) > /* skip */; > > but there's an exception: pg_atomic_exchange_u64_impl just does > > old = ptr->value; > while (!pg_atomic_compare_exchange_u64_impl(ptr, , xchg_)) > /* skip */; > > That's interesting. Why not pg_atomic_read_u64_impl there? Because our platforms don't generally guaranatee that 64bit reads: /* * 64 bit reads aren't safe on all platforms. In the generic * implementation implement them as a compare/exchange with 0. That'll * fail or succeed, but always return the old value. Possible might store * a 0, but only if the prev. value also was a 0 - i.e. harmless. */ pg_atomic_compare_exchange_u64_impl(ptr, , 0); so I *guess* I decided doing an unnecessary cmpxchg wasn't useful. But since then we've added an optimization for platforms where we know 64bit reads have single copy atomicity. So we could change that now. > I think that the simple read is actually okay as it stands: if we > are unlucky enough to get a torn read, the first compare/exchange > will fail to compare equal, and it will replace "old" with a valid > atomically-read value, and then the next loop iteration has a chance > to succeed. Therefore there's no need to pay the extra cost of a > locked read instead of an unlocked one. > > However, if that's the reasoning, why don't we make all of these > use simple reads? It seems unlikely that a locked read is free. We don't really use locked reads? All the _atomic_ wrapper forces is an actual read from memory rather than a register. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.
Thus short, simple but meaningful examples which show how to do something useful with all that in the documentation may help people take advantage of these new features. I don't have an objection to providing an example. I wasn't terribly impressed with Simon's version, but maybe we can do better. That was just a quick idea to show how they could be used, probably it can be improved. I think it should be concrete, not partly abstract; and likely it needs to be with \if not the variable proper. ISTM that currently there is no "not". Maybe I do not understand your sentence. Given my experience with "\d*", I'm not sure I would assume that a new psql feature would work with older servers. I don't recall anyone questioning whether PQserverVersion() would work with older servers. Being able to tell what version the server is is sort of the point, no? If there were a restriction on what it would work with, I would agree that that needs to be documented. But I don't think "yes, it really works" is a helpful use of documentation space. Sure. I can use psql without knowing that there is a libpq underneath, that it contains a PQserverVersion function implemented since pg7, and that the SERVER_VERSION_* variables introduced in pg10 or pg11 rely on that thus it should work with pg7/8/9 as well. It is not obvious, as a new feature could depend on any combination of server side functions, protocol extension, client library functions or client code... -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
On Wed, Sep 6, 2017 at 1:47 PM, Tom Lanewrote: > Robert Haas writes: >> On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila wrote: >>> Okay, now I understand your point, but I think we already change the >>> cost of paths in apply_projection_to_path which is done after add_path >>> for top level scan/join paths. > >> Yeah. I think that's a nasty hack, and I think it's Tom's fault. :-) > > Yeah, and it's also documented: > > * This has the same net effect as create_projection_path(), except that if > * a separate Result plan node isn't needed, we just replace the given path's > * pathtarget with the desired one. This must be used only when the caller > * knows that the given path isn't referenced elsewhere and so can be modified > * in-place. > > If somebody's applying apply_projection_to_path to a path that's already > been add_path'd, that's a violation of the documented restriction. /me is confused. Isn't that exactly what grouping_planner() is doing, and has done ever since your original pathification commit (3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)? It's iterating over current_rel->pathlist, so surely everything in there has been add_path()'d. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add citext_pattern_ops to citext contrib module
On Tue, Jul 18, 2017 at 5:18 AM, Alexey Chernyshovwrote: > Hi all, Hi Alexey, I took a look at your patch. Builds fine here, and passes the new tests. I'm new to this code, so take my review with a grain of salt. > The attached patch introduces citext_pattern_ops for citext extension type > like text_pattern_ops for text type. Here are operators ~<~, ~<=~, ~>~, ~>=~ > combined into citext_pattern_ops operator class. These operators simply > compare underlying citext values as C strings with memcmp() function. Are there any cases where performing the str_tolower with the default collation, then comparing byte-by-byte, could backfire? The added test cases don't make use of any multibyte/accented characters, so it's not clear to me yet what *should* be happening in those cases. It also might be a good idea to add some test cases that compare strings of different lengths, to exercise all the paths in internal_citext_pattern_cmp(). > +-- test citext_pattern_cmp() function explicetily. Spelling nitpick in the new SQL: s/explicetily/explicitly . --Jacob -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
Robert Haaswrites: > On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila wrote: >> Okay, now I understand your point, but I think we already change the >> cost of paths in apply_projection_to_path which is done after add_path >> for top level scan/join paths. > Yeah. I think that's a nasty hack, and I think it's Tom's fault. :-) Yeah, and it's also documented: * This has the same net effect as create_projection_path(), except that if * a separate Result plan node isn't needed, we just replace the given path's * pathtarget with the desired one. This must be used only when the caller * knows that the given path isn't referenced elsewhere and so can be modified * in-place. If somebody's applying apply_projection_to_path to a path that's already been add_path'd, that's a violation of the documented restriction. It might be that we should just get rid of apply_projection_to_path and use create_projection_path, which is less mistake-prone at the cost of manufacturing another level of Path node. Now that that has the dummypp flag, it really shouldn't make any difference in terms of the accuracy of the cost estimates. > I'd feel a lot happier if Tom were to decide how this ought to be > fixed, because - in spite of some modifications by various parallel > query code - this is basically all his design and mostly his code. I can take a look, but not right away. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour
Simon Riggswrites: > Other than that, I'm good to commit. > Any last minute objections? The docs and comments could stand a bit closer copy-editing by a native English speaker. Other than that, it seemed sane in a quick read-through. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour
On 6 September 2017 at 10:27, Tom Lanewrote: > Simon Riggs writes: >> Other than that, I'm good to commit. >> Any last minute objections? > > The docs and comments could stand a bit closer copy-editing by a > native English speaker. Other than that, it seemed sane in a > quick read-through. Will do -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
Hi Jeff, On 09/05/2017 03:47 PM, Jeff Janes wrote: I ran pgbench (-M prepared) with synchronous_commit 'on' and 'off' using both logged and unlogged tables. Also ran an internal benchmark which didn't show anything either. What scale factor and client count? How many cores per socket? It looks like Sokolov was just starting to see gains at 200 clients on 72 cores, using -N transaction. I have done a run with scale factor 300, and another with 3000 on a 2S/28C/56T/256Gb w/ 2 x RAID10 SSD machine; up to 200 clients. I would consider the runs as "noise" as I'm seeing +-1% for all client counts, so nothing like Yura is seeing in [1] for the higher client counts. I did a run with -N too using scale factor 300, using the settings in [1], but with same result (+-1%). [1] https://www.postgresql.org/message-id/d62d7d9d473d07e172d799d5a57e7...@postgrespro.ru Best regards, Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
Andres Freundwrites: > On 2017-09-06 14:31:26 -0400, Tom Lane wrote: >> However, if that's the reasoning, why don't we make all of these >> use simple reads? It seems unlikely that a locked read is free. > We don't really use locked reads? All the _atomic_ wrapper forces is an > actual read from memory rather than a register. It looks to me like two of the three implementations promise no such thing. Even if they somehow do, it hardly matters given that the cmpxchg loop would be self-correcting. Mostly, though, I'm looking at the fallback pg_atomic_read_u64_impl implementation (with a CAS), which seems far more expensive than can be justified for this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why not parallel seq scan for slow functions
Robert Haaswrites: > On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane wrote: >> If somebody's applying apply_projection_to_path to a path that's already >> been add_path'd, that's a violation of the documented restriction. > /me is confused. Isn't that exactly what grouping_planner() is doing, > and has done ever since your original pathification commit > (3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)? It's iterating over > current_rel->pathlist, so surely everything in there has been > add_path()'d. I think the assumption there is that we no longer care about validity of the input Relation, since we won't be looking at it any more (and certainly not adding more paths to it). If there's some reason why that's not true, then maybe grouping_planner has a bug there. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
On 2017-09-06 15:12:13 -0400, Tom Lane wrote: > Andres Freundwrites: > > On 2017-09-06 14:31:26 -0400, Tom Lane wrote: > >> However, if that's the reasoning, why don't we make all of these > >> use simple reads? It seems unlikely that a locked read is free. > > > We don't really use locked reads? All the _atomic_ wrapper forces is an > > actual read from memory rather than a register. > > It looks to me like two of the three implementations promise no such > thing. They're volatile vars, so why not? > Even if they somehow do, it hardly matters given that the cmpxchg loop > would be self-correcting. Well, in this one instance maybe, hardly in others. > Mostly, though, I'm looking at the fallback pg_atomic_read_u64_impl > implementation (with a CAS), which seems far more expensive than can > be justified for this. What are you suggesting as an alternative? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump
On Thu, Sep 7, 2017 at 1:43 AM, Robert Haaswrote: > On Wed, Sep 6, 2017 at 12:26 PM, Simon Riggs wrote: I'd personally be fine with --no-whatever for any whatever that might be a subsidiary property of database objects. We've got --no-security-labels, --no-tablespaces, --no-owner, and --no-privileges already, so what's wrong with --no-comments? (We've also got --no-publications; I think it's arguable whether that is the same kind of thing.) >>> >>> And --no-subscriptions in the same bucket. >> >> Yes, it is. I was suggesting that we remove those as well. FWIW, I do too. They are useful for given application code paths. > That seems like a non-starter to me. I have used those options many > times to solve real problems, and I'm sure other people have as well. > We wouldn't have ended up with all of these options if users didn't > want to control such things. As there begins to be many switches of this kind and much code duplication, I think that some refactoring into a more generic switch infrastructure would be nicer. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.
On 9/4/17 06:03, Alvaro Herrera wrote: > Tom Lane wrote: >> Michael Paquierwrites: >>> I don't like breaking the abstraction of pg_log() with the existing >>> flags with some kind of pg_debug() layer. The set of APIs present now >>> in pg_rewind for logging is nice to have, and I think that those debug >>> messages should be translated. So what about the attached? >> >> Your point about INT64_FORMAT not necessarily working with fprintf >> is an outstanding reason not to keep it like it is. I've not reviewed >> this patch in detail but I think this is basically the way to fix it. > > Actually this code goes throgh vsnprintf, not fprintf, which should be > safe, so I removed that part of the comment, and pushed. Is there a reason this was not backpatched to 9.5? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.
On Thu, Sep 7, 2017 at 10:11 AM, Peter Eisentrautwrote: > On 9/4/17 06:03, Alvaro Herrera wrote: >> Tom Lane wrote: >>> Michael Paquier writes: I don't like breaking the abstraction of pg_log() with the existing flags with some kind of pg_debug() layer. The set of APIs present now in pg_rewind for logging is nice to have, and I think that those debug messages should be translated. So what about the attached? >>> >>> Your point about INT64_FORMAT not necessarily working with fprintf >>> is an outstanding reason not to keep it like it is. I've not reviewed >>> this patch in detail but I think this is basically the way to fix it. >> >> Actually this code goes throgh vsnprintf, not fprintf, which should be >> safe, so I removed that part of the comment, and pushed. > > Is there a reason this was not backpatched to 9.5? Indeed. Please note that cherry-picking the fix from 23c1f0a works just fine. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
On Wed, Sep 6, 2017 at 11:44 PM, Tom Lanewrote: > Alvaro Herrera writes: >> Peter Eisentraut wrote: >>> We also need to have a plan for handling the build farm. Maybe keep the >>> vcregress.pl upgradecheck target as a thin wrapper for the time being? > >> The buildfarm already runs "make check" in src/bin/ when TAP tests are >> enabled, which should be enough to trigger the rewritten test, no? > > I think Peter's on about the Windows case. Not sure how that's handled, > but it's not "make check". For MSVC, one can use "vcregress.bat upgradecheck". So perhaps we could keep upgradecheck for a short time but make it a noop instead with this patch, and then remove it once buildfarm animals are upgraded to a newer client version? I would prefer seeing a simple removal of upgradecheck at the end, and put all TAP tests for binaries under the bincheck path. This feels more natural. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
On Wed, Sep 6, 2017 at 10:05 PM, Peter Eisentrautwrote: > Please send a rebased patch. > > I propose splitting the single Perl script into three separate test > files: one for basic command-line option handling and so on (I would > like to expand that later), one for the main upgrade test, and one for > the funny database names tests. That makes sense. There will be additional overhead with the creation of an extra server though. > In the testing file, you removed the paragraph that explains how to do > cross-version upgrade testing. It's unfortunate that we would lose that > functionality. What can we do about that? Right, simply removing support for something which has been here for a long time is no fun. I think that we should add in PostgresNode objects a new bindir variable which will be used to define path to binaries. Any new node created needs to go through init() or init_from_backup(), so a node created with init() would set this bindir to what pg_config in PATH reports, or to the value defined by the caller if it is defined (let's use an option for %params). A node created from init_from_backup() inherits the path of its root node. This requires a bit of refactoring first. This could help also for cross version tests out of the code core. In the existing scripts, there are the following variables: - oldsrc, old version's source tree - oldbindir, old version's installed bin dir - bindir, this version's installed bin dir. - libdir, this version's installed lib dir bindir and libdir are pointing to the currently installed version in the tree, so we could do without it, no? oldbindir and oldsrc need to be kept around to enforce the position of binaries for the old version, as well as a proper shape of the original dump being compared (+ drop of the past functions). Then, for the pg_upgrade tests, let's check for ENV{oldbindir} and enforce the bindir value of the PostgresNode to-be-upgraded. And also for ENV{oldsrc}, first check if it is defined, and then do the existing psql/dump changes. So one, in order to run cross-version checks, would just need to rely on the fact that the version where installcheck runs is the new version. Does that sound acceptable? Looking at 5bab198, those don't run that often, but I definitely agree that breaking something for no apparent reason is not cool either ;p > We also need to have a plan for handling the build farm. Maybe keep the > vcregress.pl upgradecheck target as a thin wrapper for the time being? Or we could make upgradecheck a noop, then remove it once all the MSVC animals have upgraded to a newer version of the buildfarm client which does not use upgradecheck anymore (I am fine to send a patch or a pull request to Andrew for that). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance of generic atomics
I wrote: > Anyway, I don't have a big objection to applying this. My concern > is more that we need to be taking a harder look at other parts of > the atomics infrastructure, because tweaks there are likely to buy > much more. I went ahead and pushed these patches, adding __sync_fetch_and_sub since gcc seems to provide that on the same footing as these other functions. Looking at these generic functions a bit closer, I notice that most of them are coded like old = pg_atomic_read_u32_impl(ptr); while (!pg_atomic_compare_exchange_u32_impl(ptr, , old | or_)) /* skip */; but there's an exception: pg_atomic_exchange_u64_impl just does old = ptr->value; while (!pg_atomic_compare_exchange_u64_impl(ptr, , xchg_)) /* skip */; That's interesting. Why not pg_atomic_read_u64_impl there? I think that the simple read is actually okay as it stands: if we are unlucky enough to get a torn read, the first compare/exchange will fail to compare equal, and it will replace "old" with a valid atomically-read value, and then the next loop iteration has a chance to succeed. Therefore there's no need to pay the extra cost of a locked read instead of an unlocked one. However, if that's the reasoning, why don't we make all of these use simple reads? It seems unlikely that a locked read is free. If there's actually a reason for pg_atomic_exchange_u64_impl to be different from the rest, it needs to have a comment explaining why. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.
Fabien COELHOwrites: > Thus short, simple but meaningful examples which show how to do something > useful with all that in the documentation may help people take advantage > of these new features. I don't have an objection to providing an example. I wasn't terribly impressed with Simon's version, but maybe we can do better. I think it should be concrete, not partly abstract; and likely it needs to be with \if not the variable proper. > Given my experience with "\d*", I'm not sure I would assume that a new > psql feature would work with older servers. I don't recall anyone questioning whether PQserverVersion() would work with older servers. Being able to tell what version the server is is sort of the point, no? If there were a restriction on what it would work with, I would agree that that needs to be documented. But I don't think "yes, it really works" is a helpful use of documentation space. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pluggable storage
On Fri, Sep 1, 2017 at 1:51 PM, Haribabu Kommiwrote: > Here I attached new set of patches that are rebased to the latest master. Hi Haribabu, Could you please post a new rebased version? 0007-Scan-functions-are-added-to-storage-AM.patch conflicts with recent changes. Thanks! -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Setting pd_lower in GIN metapage
On Thu, Sep 7, 2017 at 10:42 AM, Amit Langotewrote: > I too tend to think that any users who use this masking facility would > know to expect to get these failures on upgraded clusters with invalid > pd_lower in meta pages. Yes, I don't think that an optimization reducing WAL that impacts all users should be stopped by a small set of users who use an option for development purposes. > (PS: I wonder if it is reasonable to allow configuring the error level > used when a masking failure occurs? Currently, checkXLogConsistency() > will abort the process (FATAL)) It definitely is worth it in my opinion, perhaps with an on/off switch to trigger a warning instead. The reason why we use FATAL now is to trigger more easily red flags for any potential buildfarm runs: a startup process facing FATAL takes down the standby. >> Perhaps we should document this point for wal_consistency_check? > > Do you mean permanently under wal_consistency_check parameter > documentation or in the release notes under incompatibilities for the > affected index types? Under the parameter itself, in the spirit of a don't-do-that from an upgraded instance. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case for removing replacement selection sort
On Wed, Sep 6, 2017 at 2:47 PM, Thomas Munrowrote: >> I attach a patch to remove replacement selection, which I'll submit to CF 1. > > This breaks the documentation build, because > doc/src/sgml/release-9.6.sgml still contains linkend="guc-replacement-sort-tuples"> but you removed that id. Thanks for looking into it. I suppose that the solution is to change the 9.6 release notes to no longer have a replacement_sort_tuples link. Anyone else have an opinion on that? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Red-Black tree traversal tests
[ btw, please don't cc pgsql-hackers-owner, the list moderators don't need the noise ] Aleksander Alekseevwrites: > I would say that this patch is in a pretty good shape now. And I see a > 99% code coverage of rbtree.c. Let's see what committers think. I took a quick look through the patch --- haven't tried to compile it or anything like that --- and have a few comments: * There's some typos, eg extention should be extension, triversal should be traversal. Maybe try a spell checker? * The diff complains about lack of file-ending newlines in several places. * There's something weird at the start of test.c: @@ -0,0 +1,577 @@ +/*-- Maybe your compiler thinks that's a BOM? It's hard to see how it compiles otherwise. * I think it might be simpler to have the module expose just one SQL function that invokes all these individual tests internally. Less boilerplate text that way, and less to change if you add more tests later. Also, you might consider passing in TEST_SIZE as an argument of the SQL function instead of having it be hard-wired. * We don't do C++-style (//) comments around here. Please use C style. (You might consider running the code through pgindent, which will convert those comments automatically.) * It's also generally not per project style to use malloc/calloc/free directly in backend code; and it's certainly not cool to use malloc or calloc and then not check for a null result. Use palloc instead. Given the short runtime of this test, you likely needn't be too worried about pfree'ing stuff. * _PG_init() declaration seems to be a leftover? doesn't belong here either, as postgres.h will bring that in for you. * I know next to zip about red-black trees, but it disturbs me that the traversal tests use trees whose values were inserted in strictly increasing order. Seems like that's not proving as much as it could. I wonder how hard it would be to insert those integers in random order. * I'm not too pleased that the rb_find calls mostly just assume that the find won't return NULL. You should be testing for that and reporting the problem, not just dying on a null pointer crash if it happens. * Possibly the tests should exercise rb_delete on keys *not* present. And how about insertion of already-existing keys? Although that's a bit outside the scope of testing traversal, so if you want to leave it for some future expansion, that'd be OK. I'll set this back to Waiting on Author. I do encourage you to finish it up. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Hi, I know I'm 7 months late to this, but only just read the beta 4 release notes. Is there anything people using float datetimes can do that isn't a pg_dumpall | pg_restore to do a less painful update? We have several TB of data still using float datetimes and I'm trying to figure out how we can move to 10 (currently on 9.6.x) without massive amounts of $ in duplicated hardware or downtime. I did attempt a pg_dumpall | pg_restore at one point but for whatever reason we had data in tables that integer datetimes fails on (I forget the exact crash, but the datetime values were either too small or too large to fit into the integer datetimes field -- I can retry this if it would be helpful). Thanks. Regards, Omar On Mon, Feb 27, 2017 at 5:13 PM, Andres Freundwrote: > On 2017-02-27 17:00:23 -0800, Joshua D. Drake wrote: >> On 02/22/2017 02:45 PM, Tom Lane wrote: >> > Andres Freund writes: >> > > On 2017-02-22 08:43:28 -0500, Tom Lane wrote: >> > > > (To be concrete, I'm suggesting dropping --disable-integer-datetimes >> > > > in HEAD, and just agreeing that in the back branches, use of >> > > > replication >> > > > protocol with float-timestamp servers is not supported and we're not >> > > > going to bother looking for related bugs there. Given the lack of >> > > > field >> > > > complaints, I do not believe anyone cares.) >> > >> > What I *am* willing to spend time on is removing float-timestamp code >> > in HEAD. I've not yet heard anybody speak against doing that (or at >> > least, nothing I interpreted as a vote against it). If I've not heard >> > any complaints by tomorrow, I'll get started on that. >> >> Rip it out! > > Already happened: > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6aa17e0ae367afdcea07118e016111af4fa6bc3 > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merge psql ef/ev sf/sv handling functions
Fabien COELHOwrites: > Here is an attempt at merging some functions which removes 160 lines of > code. Pushed with minor adjustments. I thought a couple of variable names could be better chosen, but mostly, you can't do this sort of thing: -psql_error("The server (version %s) does not support editing function source.\n", +psql_error("The server (version %s) does not support editing %s.\n", formatPGVersionNumber(pset.sversion, false, - sverbuf, sizeof(sverbuf))); + sverbuf, sizeof(sverbuf)), + is_func ? "function source" : "view definitions"); It's too much of a pain in the rear for translators. See https://www.postgresql.org/docs/devel/static/nls-programmer.html#nls-guidelines Usually we just use two independent messages, and that's what I did. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The case for removing replacement selection sort
On Fri, Sep 1, 2017 at 6:00 AM, Peter Geogheganwrote: > On Wed, Aug 30, 2017 at 4:59 PM, Peter Geoghegan wrote: >> I may submit the simple patch to remove replacement selection, if >> other contributors are receptive. Apart from everything else, the >> "incrementalism" of replacement selection works against cleverer batch >> memory management of the type I just outlined, which seems like a >> useful direction to take tuplesort in. > > I attach a patch to remove replacement selection, which I'll submit to CF 1. This breaks the documentation build, because doc/src/sgml/release-9.6.sgml still contains but you removed that id. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minor code improvement to postgresGetForeignPlan
Tatsuro Yamadawrites: > The declaration of postgresGetForeignPlan uses baserel, but > the actual definition uses foreignrel. It would be better to sync. Pushed, thanks. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
Omar Kilaniwrites: > Is there anything people using float datetimes can do that isn't a > pg_dumpall | pg_restore to do a less painful update? Um, not really. You may be stuck on 9.6 until you can spare the effort to convert. The physical representations of timestamps are totally different in the two cases. > I did attempt a pg_dumpall | pg_restore at one point but for whatever > reason we had data in tables that integer datetimes fails on (I forget > the exact crash, but the datetime values were either too small or too > large to fit into the integer datetimes field -- I can retry this if > it would be helpful). I'm pretty sure the minimum values are the same in both cases, to wit Julian day zero. As for the max, according to the old code comments * The upper limit for dates is 5874897-12-31, which is a bit less than what * the Julian-date code can allow. We use that same limit for timestamps when * using floating-point timestamps ... For integer timestamps, the upper * limit is 294276-12-31. I would hope that any timestamps you've got beyond 294276AD are erroneous data that you need to clean up anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Replication vs. float timestamps is a disaster
On 09/06/17 18:33, Omar Kilani wrote: > Is there anything people using float datetimes can do that isn't a > pg_dumpall | pg_restore to do a less painful update? > > We have several TB of data still using float datetimes and I'm trying > to figure out how we can move to 10 (currently on 9.6.x) without > massive amounts of $ in duplicated hardware or downtime. I ran into an analogous issue with endianness of PL/Java-defined datatypes, and ended up devising a procedure [1] for treating the type one way on output and another way on input, and then constructing an UPDATE query that just assigns the column to itself using a function that's essentially identity, but forces the output-input conversion (and doesn't look like a simple identity function to the query optimizer, which otherwise might optimize the whole update away). While the details of that were specific to PL/Java and byte order, something similar might work in your case if you can afford some period, either just before or just after migration, when your normal workload is blocked, long enough to run such updates on your several TB of data. Or if that's too big a chunk of downtime, you might be able to add a column in advance, of some user-defined type the same width as an integer datetime, populate that in advance, migrate, then do a quick catalog switcheroo of the column names and types. I've never done that, so someone else might have comments on its feasibility or the best way of doing it. > the exact crash, but the datetime values were either too small or too > large to fit into the integer datetimes field -- I can retry this if That does seem important to get the details on. If the integer datetime column won't represent your stuff (and the too-large or too-small values are necessary to represent), then some schema change might be necessary. Or, if the outlying values were sentinel values of some kind (I wonder, how else would you even have datetimes outside of the int64 range?), maybe they can be replaced with others that fit. -Chap [1]: https://tada.github.io/pljava/use/byteordermigrate.html -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Hello, At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freundwrote in <20170906192353.ufp2dq7wm5fd6...@alap3.anarazel.de> > On 2017-09-06 17:36:02 +0900, Kyotaro HORIGUCHI wrote: > > The problem is that the current ReadRecord needs the first one of > > a series of continuation records from the same source with the > > other part, the master in the case. > > What's the problem with that? We can easily keep track of the beginning > of a record, and only confirm the address before that. After failure while reading a record locally, ReadRecored tries streaming to read from the beginning of a record, which is not on the master, then retry locally and.. This loops forever. > > A (or the) solution closed in the standby side is allowing to > > read a seris of continuation records from muliple sources. > > I'm not following. All we need to use is the beginning of the relevant > records, that's easy enough to keep track of. We don't need to read the > WAL or anything. The beginning is already tracked and nothing more to do. I reconsider that way and found that it doesn't need such destructive refactoring. The first *problem* was WaitForWALToBecomeAvaialble requests the beginning of a record, which is not on the page the function has been told to fetch. Still tliRecPtr is required to determine the TLI to request, it should request RecPtr to be streamed. The rest to do is let XLogPageRead retry other sources immediately. To do this I made ValidXLogPageHeader@xlogreader.c public (and renamed to XLogReaderValidatePageHeader). The patch attached fixes the problem and passes recovery tests. However, the test for this problem is not added. It needs to go to the last page in a segment then put a record continues to the next segment, then kill the standby after receiving the previous segment but before receiving the whole record. regards, -- Kyotaro Horiguchi NTT Open Source Software Center >From 8932a390bd3d1acfe5722bc62f42fc7e381ca617 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 7 Sep 2017 12:14:55 +0900 Subject: [PATCH 1/2] Allow switch WAL source midst of record. The corrent recovery machinary assumes the whole of a record is avaiable from single source. This prevents a standby from restarting under a certain condition. This patch allows source switching during reading a series of continuation records. --- src/backend/access/transam/xlog.c | 7 ++- src/backend/access/transam/xlogreader.c | 12 +--- src/include/access/xlogreader.h | 5 + 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index df4843f..eef3a97 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -11566,6 +11566,11 @@ retry: Assert(reqLen <= readLen); *readTLI = curFileTLI; + + if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, + (XLogPageHeader) readBuf)) + goto next_record_is_invalid; + return readLen; next_record_is_invalid: @@ -11700,7 +11705,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, } else { - ptr = tliRecPtr; + ptr = RecPtr; tli = tliOfPointInHistory(tliRecPtr, expectedTLEs); if (curFileTLI > 0 && tli < curFileTLI) diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 0781a7b..aa05e3f 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -27,8 +27,6 @@ static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength); -static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr, - XLogPageHeader hdr); static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess); static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record, @@ -545,7 +543,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) hdr = (XLogPageHeader) state->readBuf; - if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr)) + if (!XLogReaderValidatePageHeader(state, targetSegmentPtr, hdr)) goto err; } @@ -582,7 +580,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) /* * Now that we know we have the full header, validate it. */ - if (!ValidXLogPageHeader(state, pageptr, hdr)) + if (!XLogReaderValidatePageHeader(state, pageptr, hdr)) goto err; /* update read state information */ @@ -709,9 +707,9 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr) /* * Validate a page header */ -static bool -ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr, - XLogPageHeader hdr) +bool +XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr, + XLogPageHeader hdr) { XLogRecPtr recaddr;
Re: [HACKERS] Setting pd_lower in GIN metapage
Michael Paquierwrites: > On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane wrote: >> I looked briefly at these patches. I'm not sure that it's safe for the >> mask functions to assume that meta pages always have valid pd_lower. >> What happens when replaying log data concerning an old index that doesn't >> have that field filled? > There will be inconsistency between the pages, and the masking check > will complain. That doesn't seem like a pleasant outcome to me. The WAL consistency check code is supposed to complain if there's some kind of replication or replay failure, and this cannot be categorized as either. The idea I'd had was to apply the masking only if pd_lower >= SizeOfPageHeaderData, or if you wanted to be stricter, only if pd_lower != 0. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: DROP SUBSCRIPTION hangs if sub is disabled in the same transaction
On Wed, Sep 06, 2017 at 03:28:47PM +0900, Masahiko Sawada wrote: > On Mon, Sep 4, 2017 at 11:43 PM, Arseny Sherwrote: > > Arseny Sher writes: > > > >> Attached patch fixes this by stopping workers before RO drop, as > >> already done in case when we drop replication slot. > > > > Sorry, here is the patch. > > > > I could reproduce this issue, it's a bug. Added this to the open item. > The cause of this is commit 7e174fa7 which make subscription ddls kill > the apply worker only when transaction commit. I didn't look the patch > deeply yet but I'm not sure the approach that kills apply worker > before commit would be good. [Action required within three days. This is a generic notification.] The above-described topic is currently a PostgreSQL 10 open item. Peter, since you committed the patch believed to have created it, you own this open item. If some other commit is more relevant or if this does not belong as a v10 open item, please let us know. Otherwise, please observe the policy on open item ownership[1] and send a status update within three calendar days of this message. Include a date for your subsequent status update. Testers may discover new open items at any time, and I want to plan to get them all fixed well in advance of shipping v10. Consequently, I will appreciate your efforts toward speedy resolution. Thanks. [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization
On Wed, Sep 6, 2017 at 4:01 PM, Fabien COELHOwrote: > > Applies, compiles, works for me. > > Very very minor comments that I should have noticed before, sorry for this > additional round trip. Thank you for the dedicated review! > > In the help line, move -I just after -i, to put short options in > alphabetical and decreasing importance order. On this line, also add the > information about the default, something like: > > -i, --ini... > -I, --custom=[...]+ (default "ctgvp") > ... Agreed. Attached latest patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center pgbench_custom_initialization_v12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Restricting maximum keep segments by repslots
Hello, At Fri, 1 Sep 2017 23:49:21 -0400, Peter Eisentrautwrote in <751e09c4-93e0-de57-edd2-e64c4950f...@2ndquadrant.com> > I'm still concerned about how the critical situation is handled. Your > patch just prints a warning to the log and then goes on -- doing what? > > The warning rolls off the log, and then you have no idea what happened, > or how to recover. The victims should be complaining in their log files, but, yes, I must admit that it's extremely resembles /dev/null. And the catastrophe comes suddenly. > I would like a flag in pg_replication_slots, and possibly also a > numerical column that indicates how far away from the critical point > each slot is. That would be great for a monitoring system. Great! I'll do that right now. > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > Thanks. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A design for amcheck heapam verification
On Wed, Aug 30, 2017 at 9:29 AM, Peter Geogheganwrote: > On Wed, Aug 30, 2017 at 5:02 AM, Alvaro Herrera > wrote: >> Eh, if you want to optimize it for the case where debug output is not >> enabled, make sure to use ereport() not elog(). ereport() >> short-circuits evaluation of arguments, whereas elog() does not. > > I should do that, but it's still not really noticeable. Since this patch has now bit-rotted, I attach a new revision, V2. Apart from fixing some Makefile bitrot, this revision also makes some small tweaks as suggested by Thomas and Alvaro. The documentation is also revised and expanded, and now discusses practical aspects of the set membership being tested using a Bloom filter, how that relates to maintenance_work_mem, and so on. Note that this revision does not let the Bloom filter caller use their own dynamic shared memory, which is something that Thomas asked about. While that could easily be added, I think it should happen later. I really just wanted to make sure that my Bloom filter was not in some way fundamentally incompatible with Thomas' planned enhancements to (parallel) hash join. -- Peter Geoghegan From d4dda95dd41204315dc12936fac83d2df8676992 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Thu, 24 Aug 2017 20:58:21 -0700 Subject: [PATCH 1/2] Add Bloom filter data structure implementation. A Bloom filter is a space-efficient, probabilistic data structure that can be used to test set membership. Callers will sometimes incur false positives, but never false negatives. The rate of false positives is a function of the total number of elements and the amount of memory available for the Bloom filter. Two classic applications of Bloom filters are cache filtering, and data synchronization testing. Any user of Bloom filters must accept the possibility of false positives as a cost worth paying for the benefit in space efficiency. --- src/backend/lib/Makefile | 4 +- src/backend/lib/README| 2 + src/backend/lib/bloomfilter.c | 297 ++ src/include/lib/bloomfilter.h | 26 4 files changed, 327 insertions(+), 2 deletions(-) create mode 100644 src/backend/lib/bloomfilter.c create mode 100644 src/include/lib/bloomfilter.h diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile index d1fefe4..191ea9b 100644 --- a/src/backend/lib/Makefile +++ b/src/backend/lib/Makefile @@ -12,7 +12,7 @@ subdir = src/backend/lib top_builddir = ../../.. include $(top_builddir)/src/Makefile.global -OBJS = binaryheap.o bipartite_match.o dshash.o hyperloglog.o ilist.o \ - knapsack.o pairingheap.o rbtree.o stringinfo.o +OBJS = binaryheap.o bipartite_match.o bloomfilter.o dshash.o hyperloglog.o \ + ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/lib/README b/src/backend/lib/README index 5e5ba5e..376ae27 100644 --- a/src/backend/lib/README +++ b/src/backend/lib/README @@ -3,6 +3,8 @@ in the backend: binaryheap.c - a binary heap +bloomfilter.c - probabilistic, space-efficient set membership testing + hyperloglog.c - a streaming cardinality estimator pairingheap.c - a pairing heap diff --git a/src/backend/lib/bloomfilter.c b/src/backend/lib/bloomfilter.c new file mode 100644 index 000..e93f9b0 --- /dev/null +++ b/src/backend/lib/bloomfilter.c @@ -0,0 +1,297 @@ +/*- + * + * bloomfilter.c + * Minimal Bloom filter + * + * A Bloom filter is a probabilistic data structure that is used to test an + * element's membership of a set. False positives are possible, but false + * negatives are not; a test of membership of the set returns either "possibly + * in set" or "definitely not in set". This can be very space efficient when + * individual elements are larger than a few bytes, because elements are hashed + * in order to set bits in the Bloom filter bitset. + * + * Elements can be added to the set, but not removed. The more elements that + * are added, the larger the probability of false positives. Caller must hint + * an estimated total size of the set when its Bloom filter is initialized. + * This is used to balance the use of memory against the final false positive + * rate. + * + * Copyright (c) 2017, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/backend/lib/bloomfilter.c + * + *- + */ +#include "postgres.h" + +#include + +#include "access/hash.h" +#include "lib/bloomfilter.h" +#include "utils/memutils.h" + +#define MAX_HASH_FUNCS 10 +#define NBITS(filt) ((1 << (filt)->bloom_power)) + +typedef struct bloom_filter +{ + /* 2 ^ bloom_power is the size of the bitset (in bits) */ + intbloom_power; + unsigned char *bitset; + + /* K hash functions are used, which are randomly seeded */ + intk_hash_funcs; +
Re: [HACKERS] WIP: Failover Slots
On 14 August 2017 at 11:56, Craig Ringerwrote: > > I don't want to block failover slots on decoding on standby just because > decoding on standby would be nice to have. > However, during discussion with Tomas Munro a point has come up that does block failover slots as currently envisioned - silent timeline divergence. It's a solid reason why the current design and implementation is insufficient to solve the problem. This issue exists both with the original failover slots and with the model Robert and I were discussing. Say a decoding client has replayed from master up to commit of xid 42 at 1/1000 and confirmed flush, then a failover slots standby of the master is promoted. The standby has only received WAL from the failed master up to 1/500 with most recent xid 20. Now the standby does some other new xacts, pushing xid up to 30 at 1/1000 then continuing to insert until xid 50 at lsn 1/2000. Then the logical client reconnects. The logical client will connect to the failover slot fine, and start replay. But it'll ask for replay to start at 1/1000. The standby will happily fast-forward the slot (as it should), and start replay after 1/1000. But now we have silent divergence in timelines. The logical replica has received and committed xacts 20...42 at lsn 1/500 through 1/1000, but these are not present on the promoted master. And the replica has skipped over the new-master's xids 20...30 with lsns 1/500 through 1/1000, so they're present on the new master but not the replica. IMO, this shows that not including the timeline in replication origins was a bit of a mistake, since we'd trivially detect this if they were included - but it's a bit late now. And anyway, detection would just mean logical rep would break, which doesn't help much. The simplest fix, but rather limited, is to require that failover candidates be in synchronous_standby_names, and delay ReorderBufferCommit sending the actual commit message until all peers in s_s_n confirm flush of the commit lsn. But that's not much good if you want sync rep for your logical connections too, and is generally a hack. A more general solution requires that masters be told which peers are failover candidates, so they can ensure ordering between logical decoding and physical failover candidates. Which effectively adds another kind of sync rep, where we do "wait for physical failover candidates to flush, and only then allow logical decoding". This actually seems pretty practical with the design Robert and I discussed, but it's definitely an expansion in scope. Alternately, we could require the decoding clients to keep an eye on the flush/replay positions of all failover candidates and delay commit+confirm of decoded xacts until the upstream's failover candidates have received and flushed up to that lsn. Theat starts to look at lot like a decoding on standby based model for logical failover, where the downstream maintains slots on each failover candidate upstream. So yeah. More work needed here. Even if we suddenly decided the original failover slots model was OK, it's not sufficient to fully solve the problem. (It's something I'd thought for BDR failover, but never applied to falover slots: the problem of detecting or preventing divergence when the logical client is ahead of physical receive at the time the physical standby is promoted.) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Setting pd_lower in GIN metapage
On Thu, Sep 7, 2017 at 12:59 PM, Tom Lanewrote: > The idea I'd had was to apply the masking only if pd_lower >= > SizeOfPageHeaderData, or if you wanted to be stricter, only if > pd_lower != 0. If putting a check, it seems to me that the stricter one makes the most sense. pd_lower should remain at 0 on pre-10 servers. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers