Re: pg_dump is broken for partition tablespaces
On Wed, Mar 06, 2019 at 07:45:06PM +1300, David Rowley wrote: > Partitioned indexes have this similar inherit tablespace from parent > feature, so ca4103025dfe26 was intended to align the behaviour of the > two. Partitioned indexes happen not to suffer from the same issue as > the indexes are attached after their creation similar to what I > propose above. > > Can anyone see any fundamental reason that we should not create a > partitioned table by doing CREATE TABLE followed by ATTACH PARTITION? > If not, I'll write a patch that fixes it that way. The part for partitioned indexes is already battle-proven, so if the part for partitioned tables can be consolidated the same way that would be really nice. > As far as I can see, the biggest fundamental difference with doing > things this way will be that the column order of partitions will be > preserved, where before it would inherit the order of the partitioned > table. I'm a little unsure if doing this column reordering was an > intended side-effect or not. I don't see any direct issues with that to be honest thinking about it.. -- Michael signature.asc Description: PGP signature
ECPG regression with DECLARE STATEMENT support
Hi, Commit bd7c95f0c1a38becffceb3ea7234d57167f6d4bf add DECLARE STATEMENT support to ECPG. This introduced the new rule for EXEC SQL CLOSE cur and with that it gets transformed into ECPGclose(). Now prior to the above commit, someone can declare the cursor in the SQL statement and "CLOSE cur_name" can be also, execute as a normal statement. Example: EXEC SQL PREPARE cur_query FROM "DECLARE cur1 CURSOR WITH HOLD FOR SELECT count(*) FROM pg_class"; EXEC SQL PREPARE fetch_stmt FROM "FETCH next FROM cur1"; EXEC SQL EXECUTE cur_query; EXEC SQL EXECUTE fetch_stmt INTO :c; EXEC SQL CLOSE cur1; With commit bd7c95f0c1, "EXEC SQL CLOSE cur1" will fail and throw an error "sqlcode -245 The cursor is invalid". I think the problem here is ECPGclose(), tries to find the cursor into "connection->cursor_stmts" and if it doesn't find it there, just throws an error. Maybe require fix into ECPGclose() - rather than throwing an error continue executing statement "CLOSE cur_name" with ecpg_do(). Attaching the ECPG program for reference. Thanks, -- Rushabh Lathia www.EnterpriseDB.com test.pgc Description: Binary data
Re: BUG #15668: Server crash in transformPartitionRangeBounds
Hi, On 2019/03/06 15:48, Michael Paquier wrote: > On Tue, Mar 05, 2019 at 11:04:17PM +0900, Amit Langote wrote: >> Maybe we should error out as follows in >> transformPartitionRangeBounds(), although that means we'll get >> different error message than when using list partitioning syntax: > > Hm. I don't think that this is a good idea as you could lose some > information for the expression transformation handling, and the error > handling becomes inconsistent depending on the partition bound > strategy. It seems to me that if we cannot extract any special value > from the ColumnRef expression generated, then we ought to let > transformPartitionBoundValue() and particularly transformExprRecurse() > do the analysis work and complain if needed: > =# CREATE TABLE rp_part PARTITION OF range_parted FOR VALUES FROM > (unknown.unknown) TO (1); > ERROR: 42P01: missing FROM-clause entry for table "unknown" > LINE 1: ...p_part PARTITION OF range_parted FOR VALUES FROM > (unknown.un... > =# CREATE TABLE rp_part PARTITION OF range_parted FOR VALUES FROM > (a.a.a.a.a.a.a.a.a.a.a.a) TO (1); > ERROR: 42601: improper qualified name (too many dotted names): > a.a.a.a.a.a.a.a.a.a.a.a > LINE 1: ...p_part PARTITION OF range_parted FOR VALUES FROM > (a.a.a.a.a > > What about something like the attached instead? Minus the test > cases which should go to create_table.sql of course. Thanks for looking at this. Your patch seems better, because it allows us to keep the error message consistent with the message one would get with list-partitioned syntax. Thanks, Amit
Re: BUG #15668: Server crash in transformPartitionRangeBounds
On Tue, Mar 05, 2019 at 11:04:17PM +0900, Amit Langote wrote: > Maybe we should error out as follows in > transformPartitionRangeBounds(), although that means we'll get > different error message than when using list partitioning syntax: Hm. I don't think that this is a good idea as you could lose some information for the expression transformation handling, and the error handling becomes inconsistent depending on the partition bound strategy. It seems to me that if we cannot extract any special value from the ColumnRef expression generated, then we ought to let transformPartitionBoundValue() and particularly transformExprRecurse() do the analysis work and complain if needed: =# CREATE TABLE rp_part PARTITION OF range_parted FOR VALUES FROM (unknown.unknown) TO (1); ERROR: 42P01: missing FROM-clause entry for table "unknown" LINE 1: ...p_part PARTITION OF range_parted FOR VALUES FROM (unknown.un... =# CREATE TABLE rp_part PARTITION OF range_parted FOR VALUES FROM (a.a.a.a.a.a.a.a.a.a.a.a) TO (1); ERROR: 42601: improper qualified name (too many dotted names): a.a.a.a.a.a.a.a.a.a.a.a LINE 1: ...p_part PARTITION OF range_parted FOR VALUES FROM (a.a.a.a.a What about something like the attached instead? Minus the test cases which should go to create_table.sql of course. -- Michael diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c index a37d1f18be..ee129ba18f 100644 --- a/src/backend/parser/parse_utilcmd.c +++ b/src/backend/parser/parse_utilcmd.c @@ -3750,8 +3750,16 @@ transformPartitionRangeBounds(ParseState *pstate, List *blist, IsA(linitial(cref->fields), String)) cname = strVal(linitial(cref->fields)); - Assert(cname != NULL); - if (strcmp("minvalue", cname) == 0) + if (cname == NULL) + { +/* + * No field names have been found, meaning that there + * is not much to do with special value handling. Instead + * let the expression transformation handle any errors and + * limitations. + */ + } + else if (strcmp("minvalue", cname) == 0) { prd = makeNode(PartitionRangeDatum); prd->kind = PARTITION_RANGE_DATUM_MINVALUE; signature.asc Description: PGP signature
pg_dump is broken for partition tablespaces
Over on [1] Andres pointed out that the pg_dump support for the new to PG12 tablespace inheritance feature is broken. This is the feature added in ca4103025dfe26 to allow a partitioned table to have a tablespace that acts as the default tablespace for newly attached partitions. The idea being that you can periodically change the default tablespace for new partitions to a tablespace that sits on a disk partition with more free space without affecting the default tablespace for normal non-partitioned tables. Anyway... pg_dump is broken with this. Consider: create tablespace newts location '/tmp/newts'; create table listp (a int) partition by list(a) tablespace newts; create table listp1 partition of listp for values in(1) tablespace pg_default; create table listp2 partition of listp for values in(2); select relname,relkind,reltablespace from pg_class where relname like 'listp%' and relkind in('r','p') order by relname; produces: relname | relkind | reltablespace -+-+--- listp | p | 16384 listp1 | r | 0 listp2 | r | 16384 (3 rows) after dump/restore: relname | relkind | reltablespace -+-+--- listp | p | 16384 listp1 | r | 16384 listp2 | r | 16384 (3 rows) Here the tablespace for listp1 was inherited from listp, but we really should have restored this to use pg_default like was specified. The reason this occurs is that in pg_dump we do: SET default_tablespace = ''; CREATE TABLE public.listp1 PARTITION OF public.listp FOR VALUES IN (1); so, since we're creating the table initially as a partition the logic that applies the default partition from the parent kicks in. If we instead did: CREATE TABLE public.listp1 (a integer ); ALTER TABLE public.list1 ATTACH PARTITION public.listp FOR VALUES IN (1); then we'd have no issue, as tablespace will be set to whatever default_tablespace is set to. Partitioned indexes have this similar inherit tablespace from parent feature, so ca4103025dfe26 was intended to align the behaviour of the two. Partitioned indexes happen not to suffer from the same issue as the indexes are attached after their creation similar to what I propose above. Can anyone see any fundamental reason that we should not create a partitioned table by doing CREATE TABLE followed by ATTACH PARTITION? If not, I'll write a patch that fixes it that way. As far as I can see, the biggest fundamental difference with doing things this way will be that the column order of partitions will be preserved, where before it would inherit the order of the partitioned table. I'm a little unsure if doing this column reordering was an intended side-effect or not. [1] https://www.postgresql.org/message-id/20190305060804.jv5mz4slrnelh...@alap3.anarazel.de -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On 2019/03/05 17:56, Tatsuro Yamada wrote: Hi Robert! On 2019/03/05 11:35, Robert Haas wrote: On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada wrote: === Current design === CLUSTER command uses Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Scan method: Seq Scan 0. initializing (*2) 1. seq scanning heap (*1) 3. sorting tuples (*2) 4. writing new heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) * Scan method: Index Scan 0. initializing (*2) 2. index scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) VACUUM FULL command will proceed in the following sequence of phases: 1. seq scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) (*1): increasing the value in heap_tuples_scanned column (*2): only shows the phase in the phase column All of that sounds good. The view provides the information of CLUSTER command progress details as follows # \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---+-+---+--+- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | command | text | | | phase | text | | | cluster_index_relid | bigint | | | heap_tuples_scanned | bigint | | | heap_tuples_vacuumed | bigint | | | Still not sure if we need heap_tuples_vacuumed. We could try to report heap_blks_scanned and heap_blks_total like we do for VACUUM, if we're using a Seq Scan. I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in next patch. Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to get those from initscan(). I'll investigate it more. cluster.c copy_heap_data() heap_beginscan() heap_beginscan_internal() initscan() === Discussion points === - Progress counter for "3. sorting tuples" phase - Should we add pgstat_progress_update_param() in tuplesort.c like a "trace_sort"? Thanks to Peter Geoghegan for the useful advice! How would we avoid an abstraction violation? Hmm... What do you mean an abstraction violation? If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. - Progress counter for "6. rebuilding index" phase - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. However, I'm not sure whether it is okay or not. Doesn't seem unreasonable to me. I see, I'll add it later. Attached file is revised and WIP patch including: - Remove heap_tuples_vacuumed - Add heap_blks_scanned and heap_blks_total - Add index_vacuum_count I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that "heap_tuples_scanned" column is suitable as a counter when a scan method is both index-scan and seq-scan because CLUSTER is on a tuple basis. Regards, Tatsuro Yamada diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index d16c3d0ea5..a88e8d2492 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -51,6 +51,7 @@ #include "catalog/storage.h" #include "commands/tablecmds.h" #include "commands/event_trigger.h" +#include "commands/progress.h" #include "commands/trigger.h" #include "executor/executor.h" #include "miscadmin.h" @@ -58,6 +59,7 @@ #include "nodes/nodeFuncs.h" #include "optimizer/optimizer.h" #include "parser/parser.h" +#include "pgstat.h" #include "rewrite/rewriteManip.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" @@ -3850,6 +3852,7 @@ reindex_relation(Oid relid, int flags, int options) List *indexIds; bool is_pg_class; bool result; + int i; /* * Open and lock the relation. ShareLock is sufficient since we only need @@ -3937,6 +3940,7 @@ reindex_relation(Oid relid, int flags, int options) /* Reindex all the indexes. */ doneIndexes = NIL; + i = 1; foreach(indexId, indexIds) { Oid indexOid = lfirst_oid(indexId); @@ -3954,6 +3958,11 @@
Re: Update does not move row across foreign partitions in v11
Fujita-san, On 2019/03/06 15:10, Etsuro Fujita wrote: > --- a/doc/src/sgml/ddl.sgml > +++ b/doc/src/sgml/ddl.sgml > @@ -3376,6 +3376,13 @@ ALTER TABLE measurement ATTACH PARTITION > measurement_y2008m02 > > > > + > + > + UPDATE row movement is not supported in the cases > + where the old row is contained in a foreign table partition. > + > + > > ISTM that it's also a limitation that rows can be moved from a local > partition to a foreign partition *if the FDW support tuple routing*, so I > would vote for mentioning that as well here. Thanks for checking. I have updated the patch to include a line about this in the same paragraph, because maybe we don't need to make a new for it. Thanks, Amit diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 8314fce78f..c6b5bd0e54 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3878,6 +3878,15 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 + UPDATE row movement is not supported in the cases + where the old row is contained in a foreign table partition. Also, + moving the new row into a foreign table partition is supported only + if the table's FDW supports tuple routing. + + + + + BEFORE ROW triggers, if necessary, must be defined on individual partitions, not the partitioned table.
Re: [HACKERS] Block level parallel vacuum
On Mon, Mar 4, 2019 at 10:27 AM Masahiko Sawada wrote: > > On Sat, Mar 2, 2019 at 3:54 AM Robert Haas wrote: > > > > On Fri, Mar 1, 2019 at 12:19 AM Masahiko Sawada > > wrote: > > > > I wonder if we really want this behavior. Should a setting that > > > > controls the degree of parallelism when scanning the table also affect > > > > VACUUM? I tend to think that we probably don't ever want VACUUM of a > > > > table to be parallel by default, but rather something that the user > > > > must explicitly request. Happy to hear other opinions. If we do want > > > > this behavior, I think this should be written differently, something > > > > like this: The PARALLEL N option to VACUUM takes precedence over this > > > > option. > > > > > > For example, I can imagine a use case where a batch job does parallel > > > vacuum to some tables in a maintenance window. The batch operation > > > will need to compute and specify the degree of parallelism every time > > > according to for instance the number of indexes, which would be > > > troublesome. But if we can set the degree of parallelism for each > > > tables it can just to do 'VACUUM (PARALLEL)'. > > > > True, but the setting in question would also affect the behavior of > > sequential scans and index scans. TBH, I'm not sure that the > > parallel_workers reloption is really a great design as it is: is > > hard-coding the number of workers really what people want? Do they > > really want the same degree of parallelism for sequential scans and > > index scans? Why should they want the same degree of parallelism also > > for VACUUM? Maybe they do, and maybe somebody explain why they do, > > but as of now, it's not obvious to me why that should be true. > > I think that there are users who want to specify the degree of > parallelism. I think that hard-coding the number of workers would be > good design for something like VACUUM which is a simple operation for > single object; since there are no joins, aggregations it'd be > relatively easy to compute it. That's why the patch introduces > PARALLEL N option as well. I think that a reloption for parallel > vacuum would be just a way to save the degree of parallelism. And I > agree that users don't want to use same degree of parallelism for > VACUUM, so maybe it'd better to add new reloption like > parallel_vacuum_workers. On the other hand, it can be a separate > patch, I can remove the reloption part from this patch and will > propose it when there are requests. > Okay, attached the latest version of patch set. I've incorporated all comments I got and separated the patch for making vacuum options a Node (0001 patch). And the patch doesn't use parallel_workers. It might be proposed in the another form again in the future if requested. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center v16-0001-Make-vacuum-options-a-Node.patch Description: Binary data v16-0003-Add-paralell-P-option-to-vacuumdb-command.patch Description: Binary data v16-0002-Add-parallel-option-to-VACUUM-command.patch Description: Binary data
Re: [HACKERS] CLUSTER command progress monitor
On 2019/03/06 1:13, Robert Haas wrote: On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada wrote: === Discussion points === - Progress counter for "3. sorting tuples" phase - Should we add pgstat_progress_update_param() in tuplesort.c like a "trace_sort"? Thanks to Peter Geoghegan for the useful advice! How would we avoid an abstraction violation? Hmm... What do you mean an abstraction violation? If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. What I mean is... I think it would be useful to have this counter, but I'm not sure how the tuplesort code would know to update the counter in this case and not in other cases. The tuplesort code is used for lots of things; we can't update a counter for CLUSTER if the tuplesort is being used for CREATE INDEX or a Sort node in a query or whatever. So my question is how we would indicate to the tuplesort that it needs to do the counter update, and whether that would end up making for ugly code. Thanks for your explanation! I understood that now. I guess it means an API to get a progress for sort processing, so let us just put it aside now. I'd like to leave that as is for an appropriate person. Regards, Tatsuro Yamada
Re: Update does not move row across foreign partitions in v11
(2019/03/06 13:53), Amit Langote wrote: On 2019/03/06 13:30, David Rowley wrote: I think you missed my point. If there's no special support for "tuple moving", as you say, then what help is it to tell the user "if the FDW supports tuple routing"? The answer is, it's not any help. How would the user check such a fact? Hmm, maybe getting the following error, like one would get in PG 10 when using postgres_fdw-managed partitions: ERROR: cannot route inserted tuples to a foreign table Getting the above error is perhaps not the best way for a user to learn of this fact, but maybe we (and hopefully other FDW authors) mention this in the documentation? +1 As far as I can tell, this is just the requirements as defined in CheckValidResultRel() for CMD_INSERT. Fragments of which I pasted above. Only supporting INSERT doesn't suffice though. An FDW which intends to support tuple routing and hence 1-way tuple moving needs to updated like postgres_fdw was in PG 11. That's right; the "if the FDW supports it" in the documentation refers to the FDW's support for the callback functions BeginForeignInsert() and EndForeignInsert() described in 57.2.4. FDW Routines For Updating Foreign Tables [1] in addition to ExecForeignInsert(), as stated there: "Tuples inserted into a partitioned table by INSERT or COPY FROM are routed to partitions. If an FDW supports routable foreign-table partitions, it should also provide the following callback functions." Best regards, Etsuro Fujita [1] https://www.postgresql.org/docs/current/fdw-callbacks.html#FDW-CALLBACKS-UPDATE
Re: Update does not move row across foreign partitions in v11
(2019/03/06 13:18), Amit Langote wrote: The main problem here is indeed that the limitation is not listed under the partitioning limitations in ddl.sgml, where it's easier to notice than in the UPDATE's page. Agreed. I've updated my patch to remove the release-11.sgml changes. Thanks for the updated patch! --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3376,6 +3376,13 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 + + + UPDATE row movement is not supported in the cases + where the old row is contained in a foreign table partition. + + ISTM that it's also a limitation that rows can be moved from a local partition to a foreign partition *if the FDW support tuple routing*, so I would vote for mentioning that as well here. Best regards, Etsuro Fujita
Re: Tab completion for SKIP_LOCKED option
On Wed, Mar 6, 2019 at 2:46 PM Michael Paquier wrote: > > On Wed, Mar 06, 2019 at 11:45:01AM +0900, Masahiko Sawada wrote: > > I realized that the tab completions for SKIP_LOCKED option of both > > VACUUM and ANALYZE are missing. Attached patch adds them. > > Thanks Sawada-san, committed. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Allowing extensions to supply operator-/function-specific info
Paul Ramsey writes: >> On Mar 5, 2019, at 3:56 PM, Tom Lane wrote: >> Then you're at least missing adequate tests for the 3-arg functions... >> 3 args with the index column second will not work as this stands. > Some of the operators are indifferent to order (&&, overlaps) and others are > not (@, within) (~, contains). Right. > The 3-arg functions fortunately all have && strategies. Hm ... that probably explains why it's okay to apply the "expand" behavior to the non-indexed argument regardless of which one that is. I imagine the official definition of those functions isn't really symmetrical about which argument the expansion applies to, though? > The types on either side of the operators are always the same (geometry && > geometry), ST_Intersects(geometry, geometry). > I could simply be getting a free pass from the simplicity of my setup? Yeah, seems so. The real reason I'm pestering you about this is that, since you're the first outside user of the support-function infrastructure, other people are likely to be looking at your code to see how to do things. So I'd like your code to not contain unnecessary dependencies on accidents like that ... regards, tom lane
Re: Tab completion for SKIP_LOCKED option
On Wed, Mar 06, 2019 at 11:45:01AM +0900, Masahiko Sawada wrote: > I realized that the tab completions for SKIP_LOCKED option of both > VACUUM and ANALYZE are missing. Attached patch adds them. Thanks Sawada-san, committed. -- Michael signature.asc Description: PGP signature
Re: speeding up planning with partitions
Imai-san, Thanks for the review. On 2019/03/06 11:09, Imai, Yoshikazu wrote: > Here is the code review for previous v26 patches. > > [0002] > In expand_inherited_rtentry(): > > expand_inherited_rtentry() > { > ... > + RelOptInfo *rel = NULL; > > can be declared at more later: > > if (oldrelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) > ... > else > { > List *appinfos = NIL; > RangeTblEntry *childrte; > Index childRTindex; > +RelOptInfo *rel = NULL; > > This is fixed in v27. > [0003] > In inheritance_planner: > > + rtable_with_target = list_copy(root->parse->rtable); > > can be: > > + rtable_with_target = list_copy(parse->rtable); Sure. > [0004 or 0005] > There are redundant process in add_appendrel_other_rels (or > expand_xxx_rtentry()?). > I modified add_appendrel_other_rels like below and it actually worked. > > > add_appendrel_other_rels(PlannerInfo *root, RangeTblEntry *rte, Index rti) > { > ListCell *l; > RelOptInfo *rel; > > /* > * Add inheritance children to the query if not already done. For child > * tables that are themselves partitioned, their children will be added > * recursively. > */ > if (rte->rtekind == RTE_RELATION && !root->contains_inherit_children) > { > expand_inherited_rtentry(root, rte, rti); > return; > } > > rel = find_base_rel(root, rti); > > foreach(l, root->append_rel_list) > { > AppendRelInfo *appinfo = lfirst(l); > Index childRTindex = appinfo->child_relid; > RangeTblEntry *childrte; > RelOptInfo *childrel; > > if (appinfo->parent_relid != rti) > continue; > > Assert(childRTindex < root->simple_rel_array_size); > childrte = root->simple_rte_array[childRTindex]; > Assert(childrte != NULL); > build_simple_rel(root, childRTindex, rel); > > /* Child may itself be an inherited relation. */ > if (childrte->inh) > add_appendrel_other_rels(root, childrte, childRTindex); > } > } If you don't intialize part_rels here, then it will break any code in the planner that expects a partitioned rel with non-zero number of partitions to have part_rels set to non-NULL. At the moment, that includes the code that implements partitioning-specific optimizations such partitionwise aggregate and join, run-time pruning etc. No existing regression tests cover the cases where source inherited roles participates in those optimizations, so you didn't find anything that broke. For an example, consider the following update query: update p set a = p1.a + 1 from p p1 where p1.a = (select 1); Planner will create a run-time pruning aware Append node for p (aliased p1), for which it will need to use the part_rels array. Note that p in this case is a source relation which the above code initializes. Maybe we should add such regression tests. > I didn't investigate that problem, but there is another memory increase issue, which is because of 0003 patch I think. I'll try to solve the latter issue. Interested in details as it seems to be a separate problem. Thanks, Amit
Re: Patch to document base64 encoding
On Wed, 6 Mar 2019 11:27:38 +0900 Michael Paquier wrote: > On Tue, Mar 05, 2019 at 07:55:22PM -0600, Karl O. Pinc wrote: > > Attached: doc_base64_v4.patch > > Details about the "escape" mode are already available within the > description of function "encode". Wouldn't we want to consolidate a > description for all the modes at the same place, including some words > for hex? Your patch only includes the description of base64, which is > a good addition, still not consistent with the rest. A paragraph > after all the functions listed is fine I think as the description is > long so it would bloat the table if included directly. Makes sense. (As did hyperlinking to the RFC.) (No matter how simple I think a patch is going to be it always turns into a project. :) Attached: doc_base64_v5.patch Made index entries for hex and escape encodings. Added word "encoding" to index entries. Made entries with terms for base64, hex, and escape encodings. Added documentation for hex and escape encodings, including output formats and what are acceptable inputs. Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 6765b0d584..bd337fd530 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1752,6 +1752,9 @@ decode + + base64 encoding + decode(string text, format text) @@ -1769,16 +1772,25 @@ encode + + base64 encoding + + + hex encoding + + + escape encoding + encode(data bytea, format text) text Encode binary data into a textual representation. Supported -formats are: base64, hex, escape. -escape converts zero bytes and high-bit-set bytes to -octal sequences (\nnn) and -doubles backslashes. +formats are: +base64, +hex, +escape. encode('123\000\001', 'base64') MTIzAAE= @@ -2365,6 +2377,84 @@ format treats a NULL as a zero-element array. + + base64 encoding + + +hex encoding + + +escape encoding + + + + The string and the binary encode + and decode functions support the following + encodings: + + + + base64 + + + The base64 encoding is that + of https://tools.ietf.org/html/rfc2045#section-6.8;>RFC + 2045 section 6.8. As per the RFC, encoded lines are + broken at 76 characters. However instead of the MIME CRLF + end-of-line marker, only a newline is used for end-of-line. + + + The carriage-return, newline, space, and tab characters are + ignored by decode. Otherwise, an error is + raised when decode is supplied invalid + base64 data including when trailing padding is incorrect. + + + + + + hex + + + hex represents each 4 bits of data as a single + hexadecimal digit, 0 + through f. Encoding outputs + the a-f hex digits in lower + case. Because the smallest unit of data is 8 bits there are + always an even number of characters returned + by encode. + + + The decode function + accepts a-f characters in + either upper or lower case. An error is raised + when decode is supplied invalid hex data + including when given an odd number of characters. + + + + + + escape + + + escape converts zero bytes and high-bit-set + bytes to octal sequences + (\nnn) and doubles + backslashes. Encoding always produces 4 characters for each + high-bit-set input byte. + + + The decode function accepts any number of + octal digits after a \ character. An error is + raised when decode is supplied a + single \ not followed by an octal digit. + + + + + + See also the aggregate function string_agg in . @@ -3577,16 +3667,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); encode + + base64 encoding + + + hex encoding + + + escape encoding + encode(data bytea, format text) text Encode binary data into a textual representation. Supported - formats are: base64, hex, escape. - escape converts zero bytes and high-bit-set
Re: Update does not move row across foreign partitions in v11
On 2019/03/06 13:30, David Rowley wrote: > On Wed, 6 Mar 2019 at 17:20, Amit Langote > wrote: >> >> On 2019/03/06 12:47, David Rowley wrote: >>> It seems a bit light on detail to me. If I was a user I'd want to know >>> what exactly the FDW needed to support this. Does it need a special >>> partition move function? Looking at ExecFindPartition(), this check >>> seems to be done in CheckValidResultRel() and is basically: >>> >>> case RELKIND_FOREIGN_TABLE: >>> /* Okay only if the FDW supports it */ >>> fdwroutine = resultRelInfo->ri_FdwRoutine; >>> switch (operation) >>> { >>> case CMD_INSERT: >>> if (fdwroutine->ExecForeignInsert == NULL) >>> ereport(ERROR, >>> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >>> errmsg("cannot insert into foreign table \"%s\"", >>> RelationGetRelationName(resultRel; >>> >>> Alternatively, we could just remove the mention about "if the FDW >>> supports it", since it's probably unlikely for an FDW not to support >>> INSERT. >> >> AFAIK, there's no special support in FDWs for "tuple moving" as such. The >> "if the FDW supports it" refers to the FDW's ability to handle tuple >> routing. Note that moving/re-routing involves calling >> ExecPrepareTupleRouting followed by ExecInsert on the new tupls after the >> old tuple is deleted. If an FDW doesn't support tuple routing, then a >> tuple cannot be moved into it. That's what that text is talking about. >> >> Maybe, we should reword it as "if the FDW supports tuple routing", so that >> a reader doesn't go looking around for "tuple moving support" in FDWs. > > I think you missed my point. If there's no special support for "tuple > moving", as you say, then what help is it to tell the user "if the FDW > supports tuple routing"? The answer is, it's not any help. How would > the user check such a fact? Hmm, maybe getting the following error, like one would get in PG 10 when using postgres_fdw-managed partitions: ERROR: cannot route inserted tuples to a foreign table Getting the above error is perhaps not the best way for a user to learn of this fact, but maybe we (and hopefully other FDW authors) mention this in the documentation? > As far as I can tell, this is just the requirements as defined in > CheckValidResultRel() for CMD_INSERT. Fragments of which I pasted > above. Only supporting INSERT doesn't suffice though. An FDW which intends to support tuple routing and hence 1-way tuple moving needs to updated like postgres_fdw was in PG 11. Thanks, Amit
Re: Inheriting table AMs for partitioned tables
On Tue, 5 Mar 2019 at 19:08, Andres Freund wrote: > > On 2019-03-05 16:01:50 +1300, David Rowley wrote: > > I'd suggest it's made to work the same way as ca4103025dfe26 made > > tablespaces work. > > Hm, is that actually correct? Because as far as I can tell that doesn't > have the necessary pg_dump code to make this behaviour persistent: > > CREATE TABLESPACE frak LOCATION '/tmp/frak'; > CREATE TABLE test_tablespace (a text, b int) PARTITION BY list (a) TABLESPACE > frak ; > CREATE TABLE test_tablespace_1 PARTITION OF test_tablespace FOR VALUES in > ('a'); > CREATE TABLE test_tablespace_2 PARTITION OF test_tablespace FOR VALUES in > ('b') TABLESPACE pg_default; > CREATE TABLE test_tablespace_3 PARTITION OF test_tablespace FOR VALUES in > ('c') TABLESPACE frak; > > SELECT relname, relkind, reltablespace FROM pg_class WHERE relname LIKE > 'test_tablespace%' ORDER BY 1; > ┌───┬─┬───┐ > │ relname │ relkind │ reltablespace │ > ├───┼─┼───┤ > │ test_tablespace │ p │ 16384 │ > │ test_tablespace_1 │ r │ 16384 │ > │ test_tablespace_2 │ r │ 0 │ > │ test_tablespace_3 │ r │ 16384 │ > └───┴─┴───┘ [pg_dump/pg_restore] > ┌───┬─┬───┐ > │ relname │ relkind │ reltablespace │ > ├───┼─┼───┤ > │ test_tablespace │ p │ 16384 │ > │ test_tablespace_1 │ r │ 16384 │ > │ test_tablespace_2 │ r │ 16384 │ > │ test_tablespace_3 │ r │ 16384 │ > └───┴─┴───┘ frak... that's a bit busted. I can't instantly think of a fix, but I see the same problem does not seem to exist for partition indexes, so that's a relief since that's already in PG11. I'll take this up on another thread once I have something good to report. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Update does not move row across foreign partitions in v11
On Wed, 6 Mar 2019 at 17:20, Amit Langote wrote: > > On 2019/03/06 12:47, David Rowley wrote: > > It seems a bit light on detail to me. If I was a user I'd want to know > > what exactly the FDW needed to support this. Does it need a special > > partition move function? Looking at ExecFindPartition(), this check > > seems to be done in CheckValidResultRel() and is basically: > > > > case RELKIND_FOREIGN_TABLE: > > /* Okay only if the FDW supports it */ > > fdwroutine = resultRelInfo->ri_FdwRoutine; > > switch (operation) > > { > > case CMD_INSERT: > > if (fdwroutine->ExecForeignInsert == NULL) > > ereport(ERROR, > > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > errmsg("cannot insert into foreign table \"%s\"", > > RelationGetRelationName(resultRel; > > > > Alternatively, we could just remove the mention about "if the FDW > > supports it", since it's probably unlikely for an FDW not to support > > INSERT. > > AFAIK, there's no special support in FDWs for "tuple moving" as such. The > "if the FDW supports it" refers to the FDW's ability to handle tuple > routing. Note that moving/re-routing involves calling > ExecPrepareTupleRouting followed by ExecInsert on the new tupls after the > old tuple is deleted. If an FDW doesn't support tuple routing, then a > tuple cannot be moved into it. That's what that text is talking about. > > Maybe, we should reword it as "if the FDW supports tuple routing", so that > a reader doesn't go looking around for "tuple moving support" in FDWs. I think you missed my point. If there's no special support for "tuple moving", as you say, then what help is it to tell the user "if the FDW supports tuple routing"? The answer is, it's not any help. How would the user check such a fact? As far as I can tell, this is just the requirements as defined in CheckValidResultRel() for CMD_INSERT. Fragments of which I pasted above. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: New vacuum option to do only freezing
On Tue, Mar 5, 2019 at 8:01 PM Kyotaro HORIGUCHI wrote: > > Hello, I have some other comments. > Thank you for the comment! On Tue, Mar 5, 2019 at 8:01 PM Kyotaro HORIGUCHI wrote: > > > + nleft;/* item pointers we left */ > > The name seems to be something other, and the comment doesn't > makes sense at least.. for me.. Looking below, > > +"%.0f tuples are left as dead.\n", > +nleft), > + nleft); > > How about "nleft_dead; /* iterm pointers left as dead */"? Fixed. > > > > In this block: > > -if (nindexes == 0 && > +if ((nindexes == 0 || skip_index_vacuum) && > vacrelstats->num_dead_tuples > 0) > { > > Is it right that vacuumed_pages is incremented and FSM is updated > while the page is not actually vacuumed? Good catch. I think the FSM stuff is right because we actually did HOT pruning but the increment of vacuumed_page seems wrong to me. I think we should not increment it and not report "removed XX row version in YY pages" message. > > > tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false, > - >latestRemovedXid); > + >latestRemovedXid, > + _pruned); > > tups_pruned looks as "HOT pruned tuples". It is named "unused" in > the function's parameters. (But I think it is useless. Please see > the details below.) > > > I tested it with a simple data set. > > (autovacuum = off) > drop table if exists t; > create table t with (fillfactor=50) as select a, a % 3 as b from > generate_series(1, 9) a; > create index on t(a); > update t set a = -a where b = 0; > update t set b = b + 1 where b = 1; > > We now have 9 tuples, 15 versions and 3 out of 6 "old" tuples are > to be "left dead" by DISABLE_INDEX_CLEANUP vacuum. It means, > three tuples ends with "left dead", three tuples are removed and > 12 tuples will survive the vacuum below. > > vacuum (verbose, freeze ,disable_index_cleanup, disable_page_skipping) t; > > > INFO: "t": removed 0 row versions in 1 pages > > INFO: "t": found 0 removable, 9 nonremovable row versions in 1 out of 1 > > pages > > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 925 > > There were 0 unused item pointers. > > Skipped 0 pages due to buffer pins, 0 frozen pages. > > 0 pages are entirely empty. > > 3 tuples are left as dead. > > Three tuple versions have vanished. Actually they were removed > but not shown in the message. > > heap_prune_chain() doesn't count a live root entry of a chain as > "unused (line pointer)" since it is marked as "redierected". As > the result the vanished tuples are counted in tups_vacuumed, not > in tups_pruned. Maybe the name tups_vacuumed is confusing. After > removing tups_pruned code it works correctly. > > > INFO: "t": removed 6 row versions in 1 pages > > INFO: "t": found 6 removable, 9 nonremovable row versions in 1 out of 1 > > pages > > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 932 > > There were 0 unused item pointers. > > Skipped 0 pages due to buffer pins, 0 frozen pages. > > 0 pages are entirely empty. > > 3 tuples are left as dead. > > I see two choices of the second line above. > > 1> "t": found 6 removable, 9 nonremovable row versions in 1 out of 1 pages > > "removable" includes "left dead" tuples. > > 2> "t": found 3 removable, 12 nonremovable row versions in 1 out of 1 pages > > "removable" excludes "left dead" tuples. > > If you prefer the latter, removable and nonremoveable need to be > corrected using nleft. I think that the first vacuum should report the former message because it's true that the table has 6 removable tuples. We remove 6 tuples but leave 3 item pointers. So in the second vacuum, it should be "found 0 removable, 9 nonremovable row versions ..." and "3 tuples are left as dead". But to report more precisely it'd be better to report "0 tuples and 3 item identifiers are left as dead" here. Attached updated patch incorporated all of comments. Also I've added new reloption vacuum_index_cleanup as per discussion on the "reloption to prevent VACUUM from truncating empty pages at the end of relation" thread. Autovacuums also can skip index cleanup when the reloption is set to false. Since the setting this to false might lead some problems I've made autovacuums report the number of dead tuples and dead itemids we left. Regards, Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center From 70358ea1ae7d7c8f78d9101176f4cc1652c3c978 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada Date: Fri, 1 Feb 2019 16:02:50 +0100 Subject: [PATCH v8 2/2] Add --diable-index-cleanup option to vacuumdb. --- doc/src/sgml/ref/vacuumdb.sgml| 15 +++ src/bin/scripts/t/100_vacuumdb.pl | 9 - src/bin/scripts/vacuumdb.c| 29
Re: Update does not move row across foreign partitions in v11
On 2019/03/06 12:47, David Rowley wrote: > On Wed, 6 Mar 2019 at 16:29, Etsuro Fujita > wrote: >> That means that rows can be moved from a local partition to a foreign >> partition if the FDW supports it. > > It seems a bit light on detail to me. If I was a user I'd want to know > what exactly the FDW needed to support this. Does it need a special > partition move function? Looking at ExecFindPartition(), this check > seems to be done in CheckValidResultRel() and is basically: > > case RELKIND_FOREIGN_TABLE: > /* Okay only if the FDW supports it */ > fdwroutine = resultRelInfo->ri_FdwRoutine; > switch (operation) > { > case CMD_INSERT: > if (fdwroutine->ExecForeignInsert == NULL) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("cannot insert into foreign table \"%s\"", > RelationGetRelationName(resultRel; > > Alternatively, we could just remove the mention about "if the FDW > supports it", since it's probably unlikely for an FDW not to support > INSERT. AFAIK, there's no special support in FDWs for "tuple moving" as such. The "if the FDW supports it" refers to the FDW's ability to handle tuple routing. Note that moving/re-routing involves calling ExecPrepareTupleRouting followed by ExecInsert on the new tupls after the old tuple is deleted. If an FDW doesn't support tuple routing, then a tuple cannot be moved into it. That's what that text is talking about. Maybe, we should reword it as "if the FDW supports tuple routing", so that a reader doesn't go looking around for "tuple moving support" in FDWs. Thanks, Amit
Re: Update does not move row across foreign partitions in v11
Fujita-san, On 2019/03/06 13:04, Etsuro Fujita wrote: > (2019/03/06 11:34), Amit Langote wrote: >> Ah, indeed. In the documentation fix patch I'd posted, I also made >> changes to release-11.sgml to link to the limitations section. (I'm >> attaching it here for your reference.) > > I'm not sure it's a good idea to make changes to the release notes like > that, because 1) that would make the release notes verbose, and 2) it > might end up doing the same thing to items that have some limitations in > the existing/future release notes (eg, FOR EACH ROW triggers on > partitioned tables added to V11 has the limitation listed on the > limitation section, so the same link would be needed.), for consistency. OK, sure. It just seemed to me that the original complainer found it quite a bit surprising that such a limitation is not mentioned in the release notes, but maybe that's fine. It seems we don't normally list feature limitations in the release notes, which as you rightly say, would make them verbose. The main problem here is indeed that the limitation is not listed under the partitioning limitations in ddl.sgml, where it's easier to notice than in the UPDATE's page. I've updated my patch to remove the release-11.sgml changes. Thanks, Amit diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 7bed4f56f0..3b20e73a61 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3376,6 +3376,13 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 + + + UPDATE row movement is not supported in the cases + where the old row is contained in a foreign table partition. + + + BEFORE ROW triggers, if necessary, must be defined
Re: Converting NOT IN to anti-joins during planning
Hi Jim, Thanks for replying here. On Wed, 6 Mar 2019 at 16:37, Jim Finnerty wrote: > > Actually, we're working hard to integrate the two approaches. I haven't had > time since I returned to review your patch, but I understand that you were > checking for strict predicates as part of the nullness checking criteria, > and we definitely must have that. Zheng tells me that he has combined your > patch with ours, but before we put out a new patch, we're trying to figure > out how to preserve the existing NOT IN execution plan in the case where the > materialized subplan fits in memory. This (good) plan is effectively an > in-memory hash anti-join. > > This is tricky to do because the NOT IN Subplan to anti-join transformation > currently happens early in the planning process, whereas the decision to > materialize is made much later, when the best path is being converted into a > Plan. I guess you're still going with the OR ... IS NULL in your patch then? otherwise, you'd likely find that the transformation (when NULLs are not possible) is always a win since it'll allow hash anti-joins. (see #2 in the original email on this thread) FWIW I mentioned in [1] and Tom confirmed in [2] that we both think hacking the join condition to add an OR .. IS NULL is a bad idea. I guess you're not deterred by that? I'd say your next best move is, over on the other thread, to put up your argument against what Tom and I mentioned, then detail out what exactly you're planning. Likely this will save time. I personally don't think that ignoring this part is going to allow you to progress your patch too much further in PostgreSQL. Consensus about how $thing works is something that's needed before the $thing can ever be committed. Sometimes lack of objection can count, but an unaddressed objection is not consensus. Not trying to make this hard, just trying to explain the process. [1] https://www.postgresql.org/message-id/CAKJS1f8q4S%2B5Z7WSRDWJd__SwqMr12JdWKXTDo35ptzneRvZnw%40mail.gmail.com [2] https://www.postgresql.org/message-id/5420.1551487529%40sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Update does not move row across foreign partitions in v11
(2019/03/06 11:34), Amit Langote wrote: Ah, indeed. In the documentation fix patch I'd posted, I also made changes to release-11.sgml to link to the limitations section. (I'm attaching it here for your reference.) I'm not sure it's a good idea to make changes to the release notes like that, because 1) that would make the release notes verbose, and 2) it might end up doing the same thing to items that have some limitations in the existing/future release notes (eg, FOR EACH ROW triggers on partitioned tables added to V11 has the limitation listed on the limitation section, so the same link would be needed.), for consistency. Best regards, Etsuro Fujita
Re: [PATCH v20] GSSAPI encryption support
Stephen Frost writes: > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote: > >> Or maybe we avoid that, and you rename be-secure-gssapi.c to just >> be-gssapi.c and also combine that with the contents of >> be-gssapi-common.c. > > I don't know why we would need to, or want to, combine > be-secure-gssapi.c and be-gssapi-common.c, they do have different > roles in that be-gssapi-common.c is used even if you aren't doing > encryption, while bs-secure-gssapi.c is specifically for handling the > encryption side of things. > > Then again, be-gssapi-common.c does currently only have the one > function in it, and it's not like there's an option to compile for > *just* GSS authentication (and not encryption), or at least, I don't > think that would be a useful option to have.. Robbie? Yeah, I think I'm opposed to making that an option. Worth pointing out here: up until v6, I had this structured differently, with all the GSSAPI code in fe-gssapi.c and be-gssapi.c. The current separation was suggested by Michael Paquier for ease of reading and to keep the code churn down. >> (And similarly in libpq.) > > I do agree that we should try to keep the frontend and backend code > structures similar in this area, that seems to make sense. Well, I don't know if it's an argument in either direction, but: on the frontend we have about twice as much shared code in fe-gssapi-common.c (pg_GSS_have_ccache() and pg_GSS_load_servicename()). >> I don't see any tests in the patch. We have a Kerberos test suite at >> src/test/kerberos/ and an SSL test suite at src/test/ssl/. You can >> get some ideas there. > > Yeah, I was going to comment on that as well. We definitely should > implement tests around this. Attached. Please note that I don't really speak perl. There's a pile of duplicated code in 002_enc.pl that probably should be shared between the two. (It would also I think be possible for 001_auth.pl to set up the KDC and for 002_enc.pl to then use it.) Thanks, --Robbie signature.asc Description: PGP signature >From 42ab1ccae8e517934866ee923d80554ef1996709 Mon Sep 17 00:00:00 2001 From: Robbie Harwood Date: Tue, 5 Mar 2019 22:54:11 -0500 Subject: [PATCH] Add tests for GSSAPI/krb5 encryption --- src/test/kerberos/t/002_enc.pl | 197 + 1 file changed, 197 insertions(+) create mode 100644 src/test/kerberos/t/002_enc.pl diff --git a/src/test/kerberos/t/002_enc.pl b/src/test/kerberos/t/002_enc.pl new file mode 100644 index 00..927abe15e4 --- /dev/null +++ b/src/test/kerberos/t/002_enc.pl @@ -0,0 +1,197 @@ +use strict; +use warnings; +use TestLib; +use PostgresNode; +use Test::More; +use File::Path 'remove_tree'; + +if ($ENV{with_gssapi} eq 'yes') +{ + plan tests => 5; +} +else +{ + plan skip_all => 'GSSAPI/Kerberos not supported by this build'; +} + +my ($krb5_bin_dir, $krb5_sbin_dir); + +if ($^O eq 'darwin') +{ + $krb5_bin_dir = '/usr/local/opt/krb5/bin'; + $krb5_sbin_dir = '/usr/local/opt/krb5/sbin'; +} +elsif ($^O eq 'freebsd') +{ + $krb5_bin_dir = '/usr/local/bin'; + $krb5_sbin_dir = '/usr/local/sbin'; +} +elsif ($^O eq 'linux') +{ + $krb5_sbin_dir = '/usr/sbin'; +} + +my $krb5_config = 'krb5-config'; +my $kinit= 'kinit'; +my $kdb5_util= 'kdb5_util'; +my $kadmin_local = 'kadmin.local'; +my $krb5kdc = 'krb5kdc'; + +if ($krb5_bin_dir && -d $krb5_bin_dir) +{ + $krb5_config = $krb5_bin_dir . '/' . $krb5_config; + $kinit = $krb5_bin_dir . '/' . $kinit; +} +if ($krb5_sbin_dir && -d $krb5_sbin_dir) +{ + $kdb5_util= $krb5_sbin_dir . '/' . $kdb5_util; + $kadmin_local = $krb5_sbin_dir . '/' . $kadmin_local; + $krb5kdc = $krb5_sbin_dir . '/' . $krb5kdc; +} + +my $host = 'auth-test-localhost.postgresql.example.com'; +my $hostaddr = '127.0.0.1'; +my $realm = 'EXAMPLE.COM'; + +my $krb5_conf = "${TestLib::tmp_check}/krb5.conf"; +my $kdc_conf= "${TestLib::tmp_check}/kdc.conf"; +my $krb5_log= "${TestLib::tmp_check}/krb5libs.log"; +my $kdc_log = "${TestLib::tmp_check}/krb5kdc.log"; +my $kdc_port= int(rand() * 16384) + 49152; +my $kdc_datadir = "${TestLib::tmp_check}/krb5kdc"; +my $kdc_pidfile = "${TestLib::tmp_check}/krb5kdc.pid"; +my $keytab = "${TestLib::tmp_check}/krb5.keytab"; + +note "setting up Kerberos"; + +my ($stdout, $krb5_version); +run_log [ $krb5_config, '--version' ], '>', \$stdout + or BAIL_OUT("could not execute krb5-config"); +BAIL_OUT("Heimdal is not supported") if $stdout =~ m/heimdal/; +$stdout =~ m/Kerberos 5 release ([0-9]+\.[0-9]+)/ + or BAIL_OUT("could not get Kerberos version"); +$krb5_version = $1; + +append_to_file( + $krb5_conf, + qq![logging] +default = FILE:$krb5_log +kdc = FILE:$kdc_log + +[libdefaults] +default_realm = $realm + +[realms] +$realm = { +kdc = $hostaddr:$kdc_port +}!); + +append_to_file( + $kdc_conf, + qq![kdcdefaults] +!); + +# For new-enough versions of krb5, use the _listen settings rather +# than the _ports settings so that we can bind to localhost only. +if
Re: Update does not move row across foreign partitions in v11
On Wed, 6 Mar 2019 at 16:29, Etsuro Fujita wrote: > > (2019/03/06 11:06), David Rowley wrote: > > I don't quite understand what a "foreign table to some other > > partition" is meant to mean. Partitions don't have foreign tables, > > they can only be one themselves. > > I think "foreign table" is describing "partition" in front of that; "a > partition that is a foreign table". I think I was reading this wrong: - Currently, rows cannot be moved from a partition that is a - foreign table to some other partition, but they can be moved into a foreign - table if the foreign data wrapper supports it. I parsed it as "cannot be moved from a partition, that is a foreign table to some other partition" and subsequently struggled with what "a foreign table to some other partition" is. but now looking at it, I think it's meant to mean: "cannot be moved from a foreign table partition to another partition" > > I've tried to put all this right again in the attached. However, I was > > a bit unsure of what "but they can be moved into a foreign table if > > the foreign data wrapper supports it." is referring to. Copying Robert > > and Etsuro as this was all added in 3d956d9562aa. Hopefully, they can > > confirm what is meant by this. > > That means that rows can be moved from a local partition to a foreign > partition if the FDW supports it. It seems a bit light on detail to me. If I was a user I'd want to know what exactly the FDW needed to support this. Does it need a special partition move function? Looking at ExecFindPartition(), this check seems to be done in CheckValidResultRel() and is basically: case RELKIND_FOREIGN_TABLE: /* Okay only if the FDW supports it */ fdwroutine = resultRelInfo->ri_FdwRoutine; switch (operation) { case CMD_INSERT: if (fdwroutine->ExecForeignInsert == NULL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot insert into foreign table \"%s\"", RelationGetRelationName(resultRel; Alternatively, we could just remove the mention about "if the FDW supports it", since it's probably unlikely for an FDW not to support INSERT. > IMO, I think the existing mention in [3] is good, so I would vote for > putting the same mention in table 5.10.2.3 in [2] as well. I think the sentence is unclear, at least I struggled to parse it the first time. Happy for Amit to choose some better words and include in his patch. I think it should be done in the same commit. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Update does not move row across foreign partitions in v11
(2019/03/06 11:06), David Rowley wrote: On Tue, 5 Mar 2019 at 03:01, Derek Hans wrote: Based on a reply to reporting this as a bug, moving rows out of foreign partitions is not yet implemented so this is behaving as expected. There's a mention of this limitation in the Notes section of the Update docs. (Moving this discussion to -Hackers) In [1], Derek reports that once a row is inserted into a foreign partition that an UPDATE does not correctly route it back out into the correct partition. I didn't really follow the foreign partition code when it went in, but do recall being involved in the documentation about the limitations of partitioned tables in table 5.10.2.3 in [2]. Unfortunately, table 5.10.2.3 does not seem to mention this limitation at all. As Derek mentions, there is a brief mention in [3] in the form of: "Currently, rows cannot be moved from a partition that is a foreign table to some other partition, but they can be moved into a foreign table if the foreign data wrapper supports it." I don't quite understand what a "foreign table to some other partition" is meant to mean. Partitions don't have foreign tables, they can only be one themselves. I think "foreign table" is describing "partition" in front of that; "a partition that is a foreign table". I've tried to put all this right again in the attached. However, I was a bit unsure of what "but they can be moved into a foreign table if the foreign data wrapper supports it." is referring to. Copying Robert and Etsuro as this was all added in 3d956d9562aa. Hopefully, they can confirm what is meant by this. That means that rows can be moved from a local partition to a foreign partition if the FDW supports it. IMO, I think the existing mention in [3] is good, so I would vote for putting the same mention in table 5.10.2.3 in [2] as well. Best regards, Etsuro Fujita
Re: Rare SSL failures on eelpout
On Wed, Mar 6, 2019 at 4:07 PM Shawn Debnath wrote: > On Wed, Mar 06, 2019 at 11:13:31AM +1300, Thomas Munro wrote: > > So... can anyone tell us what happens on Windows? > C:\Users\Shawn Debnath\Desktop>c:\Python27\python.exe tmunro-ssl-test.py > --client > Sending A... > 2 > Sending B... > [Errno 10054] An existing connection was forcibly closed by the remote host > Sending C... > [Errno 10054] An existing connection was forcibly closed by the remote host > Traceback (most recent call last): > File "tmunro-ssl-test.py", line 57, in > client() > File "tmunro-ssl-test.py", line 51, in client > print s.recv(1024) > socket.error: [Errno 10054] An existing connection was forcibly closed by the > remote host Thanks! Wow, so not only can we not read the final message sent by the server, if we try we get an error. That's... not what we want. I wonder if there might be a way to put the socket into don't-do-that mode... -- Thomas Munro https://enterprisedb.com
Re: few more wait events to add to docs
On Wed, Mar 06, 2019 at 02:21:15PM +1300, Thomas Munro wrote: > Here's some missing documentation. The addition looks fine to me. > Hmm, yeah. I wish these were alphabetised, I wish there was an > automated warning about this, I wish these tranches were declared a > better way that by adding code in RegisterLWLockTranches(). Why not using this occasion to reorganize the LWLock and Lock sections so as their entries are fully alphabetized then? I have made an effort in this direction in 5ef037c for all the sections except these two. And honestly getting all these organized would really help documentation readers. -- Michael signature.asc Description: PGP signature
Tab completion for SKIP_LOCKED option
Hi, I realized that the tab completions for SKIP_LOCKED option of both VACUUM and ANALYZE are missing. Attached patch adds them. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center skip_locked.patch Description: Binary data
Re: Online verification of checksums
Greetings, On Tue, Mar 5, 2019 at 18:36 Michael Paquier wrote: > On Tue, Mar 05, 2019 at 02:08:03PM +0100, Tomas Vondra wrote: > > Based on quickly skimming that thread the main issue seems to be > > deciding which files in the data directory are expected to have > > checksums. Which is a valid issue, of course, but I was expecting > > something about partial read/writes etc. > > I remember complaining about partial write handling as well for the > base backup checks... There should be an email about it on the list, > cannot find it now ;p > > > My understanding is that: > > > > (a) The checksum verification should not generate false positives (same > > as for basebackup). > > > > (b) The partial reads do emit warnings, which might be considered false > > positives I guess. Which is why I'm arguing for changing it to do the > > same thing basebackup does, i.e. ignore this. > > Well, at least that's consistent... Argh, I really think that we > ought to make the failures reported harder because that's easier to > detect within a tool and some deployments set log_min_messages > > WARNING so checksum failures would just be lost. For base backups we > don't care much about that as files are just blindly copied so they > could have torn pages, which is fine as that's fixed at replay. Now > we are talking about a set of tools which could have reliable > detection mechanisms for those problems. I’m traveling but will try to comment more in the coming days but in general I agree with Tomas on these items. Also, pg_basebackup has to handle torn pages when it comes to checksums just like the verify tool does, and having them be consistent (along with external tools) would really be for the best, imv. I still feel like a retry of a short read (try reading more to get the whole page..) would be alright and reading until we hit eof and then moving on. I’m not sure it’s possible but I do worry a bit that we might get a short read from a network file system or something that isn’t actually at eof and then we would skip a significant remaining portion of the file... another thought might be to stat the file after we have opened it to see it’s length... Just a few thoughts since I’m on my phone. Will try to write up something more in a day or two. Thanks! Stephen
Re: Online verification of checksums
On Tue, Mar 05, 2019 at 02:08:03PM +0100, Tomas Vondra wrote: > Based on quickly skimming that thread the main issue seems to be > deciding which files in the data directory are expected to have > checksums. Which is a valid issue, of course, but I was expecting > something about partial read/writes etc. I remember complaining about partial write handling as well for the base backup checks... There should be an email about it on the list, cannot find it now ;p > My understanding is that: > > (a) The checksum verification should not generate false positives (same > as for basebackup). > > (b) The partial reads do emit warnings, which might be considered false > positives I guess. Which is why I'm arguing for changing it to do the > same thing basebackup does, i.e. ignore this. Well, at least that's consistent... Argh, I really think that we ought to make the failures reported harder because that's easier to detect within a tool and some deployments set log_min_messages > WARNING so checksum failures would just be lost. For base backups we don't care much about that as files are just blindly copied so they could have torn pages, which is fine as that's fixed at replay. Now we are talking about a set of tools which could have reliable detection mechanisms for those problems. -- Michael signature.asc Description: PGP signature
Re: Pluggable Storage - Andres's take
Hi, Thanks for looking! On 2019-03-05 18:27:45 -0800, Ashwin Agrawal wrote: > While playing with the tableam, usage of which starts with commit > v12-0023-tableam-Introduce-and-use-begin-endscan-and-do-i.patch, should we > check for NULL function pointer before actually calling the same and ERROR > out instead as NOT_SUPPORTED or something on those lines. Scans seem like absolutely required part of the functionality, so I don't think there's much point in that. It'd just bloat code and runtime. > Understand its kind of think which should get caught during development. > But still currently it segfaults if missing to define some AM function, The segfault iself doesn't bother me at all, it's just a NULL pointer dereference. If we were to put Asserts somewhere it'd crash very similarly. I think you have a point in that: > might be easier for iterative development to error instead in common place. Would make it a tiny bit easier to implement a new AM. We could probably add a few asserts to GetTableAmRoutine(), to check that required functions are implemted. Don't think that'd make a meaningful difference for something like the scan functions, but it'd probably make it easier to forward port AMs to the next release - I'm pretty sure we're going to add required callbacks in the next few releases. Greetings, Andres Freund
Re: Pluggable Storage - Andres's take
Hi, Thanks for looking! On 2019-03-05 18:27:45 -0800, Ashwin Agrawal wrote: > While playing with the tableam, usage of which starts with commit > v12-0023-tableam-Introduce-and-use-begin-endscan-and-do-i.patch, should we > check for NULL function pointer before actually calling the same and ERROR > out instead as NOT_SUPPORTED or something on those lines. Scans seem like absolutely required part of the functionality, so I don't think there's much point in that. It'd just bloat code and runtime. > Understand its kind of think which should get caught during development. > But still currently it segfaults if missing to define some AM function, The segfault iself doesn't bother me at all, it's just a NULL pointer dereference. If we were to put Asserts somewhere it'd crash very similarly. I think you have a point in that: > might be easier for iterative development to error instead in common place. Would make it a tiny bit easier to implement a new AM. We could probably add a few asserts to GetTableAmRoutine(), to check that required functions are implemted. Don't think that'd make a meaningful difference for something like the scan functions, but it'd probably make it easier to forward port AMs to the next release - I'm pretty sure we're going to add required callbacks in the next few releases. Greetings, Andres Freund
Re: Update does not move row across foreign partitions in v11
On 2019/03/06 11:29, David Rowley wrote: > On Wed, 6 Mar 2019 at 15:26, Amit Langote > wrote: >> >>> I've tried to put all this right again in the attached. However, I was >>> a bit unsure of what "but they can be moved into a foreign table if >>> the foreign data wrapper supports it." is referring to. Copying Robert >>> and Etsuro as this was all added in 3d956d9562aa. Hopefully, they can >>> confirm what is meant by this. >> >> Did you miss my reply on that thread? >> >> https://www.postgresql.org/message-id/CA%2BHiwqF3gma5HfCJb4_cOk0_%2BLEpVc57EHdBfz_EKt%2BNu0hNYg%40mail.gmail.com > > Yes. I wasn't aware that there were two threads for this. Ah, indeed. In the documentation fix patch I'd posted, I also made changes to release-11.sgml to link to the limitations section. (I'm attaching it here for your reference.) Thanks, Amit diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 7bed4f56f0..3b20e73a61 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -3376,6 +3376,13 @@ ALTER TABLE measurement ATTACH PARTITION measurement_y2008m02 + + + UPDATE row movement is not supported in the cases + where the old row is contained in a foreign table partition. + + + BEFORE ROW triggers, if necessary, must be defined diff --git a/doc/src/sgml/release-11.sgml b/doc/src/sgml/release-11.sgml index 14e2726f0c..d6fc5a3e31 100644 --- a/doc/src/sgml/release-11.sgml +++ b/doc/src/sgml/release-11.sgml @@ -2578,6 +2578,13 @@ Branch: REL9_3_STABLE [84261eb10] 2018-10-19 17:02:26 -0400 column now cause affected rows to be moved to the appropriate partitions (Amit Khandekar) + + +However, not all cases where such row movment would be necessary +are handled currently; see +declarative +partitioning limitations for more information. +
Re: Update does not move row across foreign partitions in v11
On Wed, 6 Mar 2019 at 15:26, Amit Langote wrote: > > > I've tried to put all this right again in the attached. However, I was > > a bit unsure of what "but they can be moved into a foreign table if > > the foreign data wrapper supports it." is referring to. Copying Robert > > and Etsuro as this was all added in 3d956d9562aa. Hopefully, they can > > confirm what is meant by this. > > Did you miss my reply on that thread? > > https://www.postgresql.org/message-id/CA%2BHiwqF3gma5HfCJb4_cOk0_%2BLEpVc57EHdBfz_EKt%2BNu0hNYg%40mail.gmail.com Yes. I wasn't aware that there were two threads for this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Pluggable Storage - Andres's take
Hi, While playing with the tableam, usage of which starts with commit v12-0023-tableam-Introduce-and-use-begin-endscan-and-do-i.patch, should we check for NULL function pointer before actually calling the same and ERROR out instead as NOT_SUPPORTED or something on those lines. Understand its kind of think which should get caught during development. But still currently it segfaults if missing to define some AM function, might be easier for iterative development to error instead in common place. Or should there be upfront check for NULL somewhere if all the AM functions are mandatory to have functions defined for them and should not be NULL.
Re: Patch to document base64 encoding
On Tue, Mar 05, 2019 at 07:55:22PM -0600, Karl O. Pinc wrote: > Attached: doc_base64_v4.patch Details about the "escape" mode are already available within the description of function "encode". Wouldn't we want to consolidate a description for all the modes at the same place, including some words for hex? Your patch only includes the description of base64, which is a good addition, still not consistent with the rest. A paragraph after all the functions listed is fine I think as the description is long so it would bloat the table if included directly. -- Michael signature.asc Description: PGP signature
Re: Prevent extension creation in temporary schemas
On Tue, Mar 05, 2019 at 12:47:54PM +, Chris Travers wrote: > I tried installing a test extension into a temp schema. I found > this was remarkably difficult to do because pg_temp did not work (I > had to create a temporary table and then locate the actual table it > was created in). While that might also be a bug it is not in the > scope of this patch so mostly noting in terms of future work. pgcrypto works in this case. > After creating the extension I did as follows: > \dx in the current session shows the extension > \dx in a stock psql shows the extension in a separate session > \dx with a patched psql in a separate session does not show the > extension. > > In terms of the scope of this patch, I think this correctly and > fully solves the problem at hand. I was just looking at this patch this morning with fresh eyes, and I think that I have found one argument to *not* apply it. Imagine the following in one session: =# create extension pgcrypto with schema pg_temp_3; CREATE EXTENSION =# \dx List of installed extensions Name | Version | Schema | Description --+-++-- pgcrypto | 1.3 | pg_temp_3 | cryptographic functions plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (2 rows) That's all good, we see that the session which created this extension has it listed. Now let's use in parallel a second session: =# create extension pgcrypto with schema pg_temp_4; ERROR: 42710: extension "pgcrypto" already exists LOCATION: CreateExtension, extension.c:1664 =# \dx List of installed extensions Name | Version | Schema | Description --+-++-- plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (1 row) This is actually also good, because the extension of the temporary schema of the first session does not show up. Now I think that this can bring some confusion to the user actually, because the extension becomes not listed via \dx, but trying to create it with a different schema fails. Thoughts? -- Michael signature.asc Description: PGP signature
RE: speeding up planning with partitions
Amit-san, On Tue, Mar 5, 2019 at 10:24 AM, Amit Langote wrote: > On 2019/03/05 9:50, Amit Langote wrote: > > I'll post the updated patches after diagnosing what I'm suspecting a > > memory over-allocation bug in one of the patches. If you configure > > build with --enable-cassert, you'll see that throughput decreases as > > number of partitions run into many thousands, but it doesn't when > > asserts are turned off. > > Attached an updated version. This incorporates fixes for both Jesper's > and Imai-san's review. Thanks for updating patches! Here is the code review for previous v26 patches. [0002] In expand_inherited_rtentry(): expand_inherited_rtentry() { ... + RelOptInfo *rel = NULL; can be declared at more later: if (oldrelation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) ... else { List *appinfos = NIL; RangeTblEntry *childrte; Index childRTindex; +RelOptInfo *rel = NULL; [0003] In inheritance_planner: + rtable_with_target = list_copy(root->parse->rtable); can be: + rtable_with_target = list_copy(parse->rtable); [0004 or 0005] There are redundant process in add_appendrel_other_rels (or expand_xxx_rtentry()?). I modified add_appendrel_other_rels like below and it actually worked. add_appendrel_other_rels(PlannerInfo *root, RangeTblEntry *rte, Index rti) { ListCell *l; RelOptInfo *rel; /* * Add inheritance children to the query if not already done. For child * tables that are themselves partitioned, their children will be added * recursively. */ if (rte->rtekind == RTE_RELATION && !root->contains_inherit_children) { expand_inherited_rtentry(root, rte, rti); return; } rel = find_base_rel(root, rti); foreach(l, root->append_rel_list) { AppendRelInfo *appinfo = lfirst(l); Index childRTindex = appinfo->child_relid; RangeTblEntry *childrte; RelOptInfo *childrel; if (appinfo->parent_relid != rti) continue; Assert(childRTindex < root->simple_rel_array_size); childrte = root->simple_rte_array[childRTindex]; Assert(childrte != NULL); build_simple_rel(root, childRTindex, rel); /* Child may itself be an inherited relation. */ if (childrte->inh) add_appendrel_other_rels(root, childrte, childRTindex); } } > and Imai-san's review. I haven't been able to pin down the bug (or > whatever) that makes throughput go down as the partition count increases, > when tested with a --enable-cassert build. I didn't investigate that problem, but there is another memory increase issue, which is because of 0003 patch I think. I'll try to solve the latter issue. -- Yoshikazu Imai
Re: Update does not move row across foreign partitions in v11
On Tue, 5 Mar 2019 at 03:01, Derek Hans wrote: > Based on a reply to reporting this as a bug, moving rows out of foreign > partitions is not yet implemented so this is behaving as expected. There's a > mention of this limitation in the Notes section of the Update docs. (Moving this discussion to -Hackers) In [1], Derek reports that once a row is inserted into a foreign partition that an UPDATE does not correctly route it back out into the correct partition. I didn't really follow the foreign partition code when it went in, but do recall being involved in the documentation about the limitations of partitioned tables in table 5.10.2.3 in [2]. Unfortunately, table 5.10.2.3 does not seem to mention this limitation at all. As Derek mentions, there is a brief mention in [3] in the form of: "Currently, rows cannot be moved from a partition that is a foreign table to some other partition, but they can be moved into a foreign table if the foreign data wrapper supports it." I don't quite understand what a "foreign table to some other partition" is meant to mean. Partitions don't have foreign tables, they can only be one themselves. I've tried to put all this right again in the attached. However, I was a bit unsure of what "but they can be moved into a foreign table if the foreign data wrapper supports it." is referring to. Copying Robert and Etsuro as this was all added in 3d956d9562aa. Hopefully, they can confirm what is meant by this. [1] https://www.postgresql.org/message-id/cagrp7a3xc1qy_b2wjcgad8uqts_ndcjn06o5mts_ne1nyhb...@mail.gmail.com [2] https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-LIMITATIONS [3] https://www.postgresql.org/docs/devel/sql-update.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services doc_confirm_foreign_partition_limitations.patch Description: Binary data
Re: Fix memleaks and error handling in jsonb_plpython
On Tue, Mar 05, 2019 at 02:10:01PM +0300, Nikita Glukhov wrote: > I known about this volatility issues, but maybe I incorrectly understand what > should be marked as volatile for pointer variables: the pointer itself and/or > the memory referenced by it. I thought that only pointer needs to be marked, > and also there is message [1] clearly describing what needs to be marked. Yeah, sorry for bringing some confusion. > Previously in PLyMapping_ToJsonbValue() the whole contents of PyObject was > marked as volatile, not the pointer itself which is not modified in PG_TRY: > > - /* We need it volatile, since we use it after longjmp */ > - volatile PyObject *items_v = NULL; > > So, I removed volatile qualifier here. Okay, this one looks correct to me. Well the whole variable has been removed. > Variable 'result' is also not modified in PG_TRY, it is also non-volatile. Fine here as well. > I marked only 'key' variable in PLyObject_FromJsonbContainer() as volatile, > because it is really modified in the loop inside PG_TRY(), and > PLyObject_FromJsonbValue() call after its assignment can throw PG > exception: > + PyObject *volatile key = NULL; One thing that you are missing here is that key can become NULL when reaching the catch block, so Py_XDECREF() should be called on it only when the value is not NULL. And actually, looking closer, you don't need to have that volatile variable at all, no? Why not just declaring it as a PyObject in the while loop? Also here, key and val can be NULL, so we had better only call Py_XDECREF() when they are not. On top of that, potential errors on PyDict_SetItem() not be simply ignored, so the loop should only break when the key or the value is NULL, but not when PyDict_SetItem() has a problem. > Also I have idea to introduce a global list of Python objects that need to be > dereferenced in PG_CATCH inside PLy_exec_function() in the case of exception. > Then typical code will be look like that: Perhaps we could do that, but let's not juggle with the code more than necessary for a bug fix. > Yes, PyList_SetItem() and PyString_FromStringAndSize() should be checked, > but CPython's PyList_SetItem() really should not fail because list storage > is preallocated: Hm. We could add an elog() here for safety I think. That's not a big deal either. Another thing is that you cannot just return within a try block with what is added in PLyObject_FromJsonbContainer, or the error stack is not reset properly. So they should be replaced by breaks. -- Michael signature.asc Description: PGP signature
Re: speeding up planning with partitions
On 2019/03/06 0:57, Jesper Pedersen wrote: > On 3/5/19 5:24 AM, Amit Langote wrote: >> Attached an updated version. This incorporates fixes for both Jesper's >> and Imai-san's review. I haven't been able to pin down the bug (or >> whatever) that makes throughput go down as the partition count increases, >> when tested with a --enable-cassert build. >> > > Thanks ! > > I'm seeing the throughput going down as well, but are you sure it isn't > just the extra calls of MemoryContextCheck you are seeing ? A flamegraph > diff highlights that area -- sent offline. > > A non cassert build shows the same profile for 64 and 1024 partitions. Thanks for testing. I now see what's happening here. I was aware that it's MemoryContextCheck overhead but didn't quite understand why the time it takes should increase with the number of partitions increasing, especially given that the patch makes it so that only one partition is accessed if that's what the query needs to do. What I had forgotten however is that MemoryContextCheck checks *all* contexts in every transaction, including the "partition descriptor" context which stores a partitioned table's PartitionDesc. PartitionDesc gets bigger as the number of partitions increase, so does the time to check the context it's allocated in. So, the decrease in throughput in cassert build as the number of partitions increases is not due to any fault of this patch series as I had suspected. Phew! Thanks, Amit
Re: Patch to document base64 encoding
Hi Fabien, On Tue, 5 Mar 2019 23:02:26 +0100 (CET) Fabien COELHO wrote: > > Attached: doc_base64_v3.patch > > I'm ok with referencing the historical MIME RFC. For the record, RFC 2045 is updated but not yet obsolete. The updates don't invalidate section 6.8. > "RFC2045 section 6.8" -> "RFC 2045 Section 6.8" > > you can link to the RFC directly with: > > https://tools.ietf.org/html/rfc2045#section-6.8;>RFC 2045 > Section 6.8 Done. Attached: doc_base64_v4.patch Regards, Karl Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 6765b0d584..06dfd84d22 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1752,6 +1752,9 @@ decode + + base64 + decode(string text, format text) @@ -1769,13 +1772,16 @@ encode + + base64 + encode(data bytea, format text) text Encode binary data into a textual representation. Supported -formats are: base64, hex, escape. +formats are: base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes. @@ -2365,6 +2371,23 @@ format treats a NULL as a zero-element array. + + base64 + + + + The base64 encoding of the encode + and decode functions is that + of https://tools.ietf.org/html/rfc2045#section-6.8;>RFC 2045 + section 6.8. As per the RFC, encoded lines are broken at 76 + characters. However instead of the MIME CRLF end-of-line marker, only a + newline is used for end-of-line. The carriage-return, newline, space, + and tab characters are ignored by decode. + Otherwise, an error is raised when decode is + supplied invalid base64 data including when trailing padding is + incorrect. + + See also the aggregate function string_agg in . @@ -3577,13 +3600,16 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); encode + + base64 + encode(data bytea, format text) text Encode binary data into a textual representation. Supported - formats are: base64, hex, escape. + formats are: base64, hex, escape. escape converts zero bytes and high-bit-set bytes to octal sequences (\nnn) and doubles backslashes.
Re: Delay locking partitions during query execution
On Tue, Mar 5, 2019 at 8:04 PM David Rowley wrote: > Actually, I'm not sure it could work at all. It does not seem very > safe to lookup a partition's parent without actually holding a lock on > the partition and we can't lock the partition and then lock each > parent in turn as that's the exact opposite locking order that we do > when querying a partitioned table, so could result in deadlocking. > > Many moons ago in the 0002 patch in [1] I proposed to change the way > we store partition hierarchies which involved ditching bool > pg_class.relispartition in favour of Oid pg_class.relpartitionparent. > That would make parent lookups pretty fast, but... given the above it > seems like we couldn't do this at all. One thing that is both kinda nice and kinda strange about our heavyweight lock manager is that it has no idea what it is locking. If you say "please lock OID 12345" for me, it does, but it doesn't know anything about the relationship between that OID and any other thing you might want to lock. Compare that to what Gray and Reuter describe in "Transaction Processing: Concepts and Techniques", Section 7.8, Granular Locking. There, there is an idea that the set of possible locks forms a tree, and that locking a node of the tree is tantamount to locking all of its descendents. Such a concept would be useful here: you could take e.g. AccessShareLock on the root of the partitioning hierarchy and that would in effect give you that lock mode on every partition, but without needing to make a separate entry for each partition lock. Sadly, implementing such a thing for PostgreSQL seems extremely challenging. We'd somehow have to build up in shared memory an idea of what the locking hierarchy was, and then update it as DDL happens, and drop entries that aren't interesting any more so we don't run out of shared memory. Performance and concurrency would be really hard problems, and assumptions about the current locking model are baked into code in MANY parts of the system, so finding everything that needed to be (or could be) changed would probably be extremely challenging. But if you could make it all work we might well end up in a better state than we are today. However, I'm leaving this problem for a time when I have six months or a year to do nothing else, because I'm pretty sure that's what it would take. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Allowing extensions to supply operator-/function-specific info
> On Mar 5, 2019, at 3:56 PM, Tom Lane wrote: > > Paul Ramsey writes: >> On Mar 5, 2019, at 3:26 PM, Tom Lane wrote: >>> Hm, I think your addition of this bit is wrong: >>> >>> +/* >>> +* Arguments were swapped to put the index value on the >>> +* left, so we need the commutated operator for >>> +* the OpExpr >>> +*/ >>> +if (swapped) >>> +{ >>> +oproid = get_commutator(oproid); >>> +if (!OidIsValid(oproid)) >>> PG_RETURN_POINTER((Node *)NULL); >>> +} >>> >>> We already did the operator lookup with the argument types in the desired >>> order, so this is introducing an extra swap. The only reason it appears >>> to work, I suspect, is that all your index operators are self-commutators. > >> I was getting regression failures until I re-swapped the operator… >> SELEcT * FROM foobar WHERE ST_Within(ConstA, VarB) > > Ah ... so the real problem here is that *not* all of your functions > treat their first two inputs alike, and the hypothetical future > improvement I commented about is needed right now. I should've > looked more closely at the strategies in your table; then I would've > realized the patch as I proposed it didn't work. > > But this code isn't right either. I'm surprised you're not getting > crashes --- perhaps there aren't cases where the first and second args > are of incompatible types? Also, it's certainly wrong to be doing this > sort of swap in only one of the two code paths. > > There's more than one way you could handle this, but the way that > I was vaguely imagining was to have two strategy entries in each > IndexableFunction entry, one to apply if the first function argument > is the indexable one, and the other to apply if the second function > argument is the indexable one. If you leave the operator lookup as > I had it (using the already-swapped data types) then you'd have to > make sure that the latter set of strategy entries are written as > if the arguments get swapped before selecting the strategy, which > would be confusing perhaps :-( --- for instance, st_within would > use RTContainedByStrategyNumber in the first case but > RTContainsStrategyNumber in the second. But otherwise you need the > separate get_commutator step, which seems like one more catalog lookup > than you really need. > >> I feel OK about it, if for no other reason than it passes all the tests :) > > Then you're at least missing adequate tests for the 3-arg functions... > 3 args with the index column second will not work as this stands. Some of the operators are indifferent to order (&&, overlaps) and others are not (@, within) (~, contains). The 3-arg functions fortunately all have && strategies. The types on either side of the operators are always the same (geometry && geometry), ST_Intersects(geometry, geometry). I could simply be getting a free pass from the simplicity of my setup? P.
Re: few more wait events to add to docs
On Wed, Mar 6, 2019 at 2:18 PM Alvaro Herrera wrote: > On 2019-Mar-05, Jeremy Schneider wrote: > > It's possible I'm misreading this, but I'm thinking that commits > > cc5f8136 and ab9e0e71 added a few tranches which we need to add to the docs. > > > > session_dsa, session_record_table, session_typmod_table, and > > shared_tuplestore > > > > https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE > > https://postgr.es/m/CAB7nPqSU1=-3fqvz+ncgm5vmro4lszjgqsmoygwizs2hvpy...@mail.gmail.com Here's some missing documentation. Hmm, yeah. I wish these were alphabetised, I wish there was an automated warning about this, I wish these tranches were declared a better way that by adding code in RegisterLWLockTranches(). -- Thomas Munro https://enterprisedb.com missing-tranches.patch Description: Binary data
Re: few more wait events to add to docs
On 2019-Mar-05, Jeremy Schneider wrote: > It's possible I'm misreading this, but I'm thinking that commits > cc5f8136 and ab9e0e71 added a few tranches which we need to add to the docs. > > session_dsa, session_record_table, session_typmod_table, and > shared_tuplestore > > https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE https://postgr.es/m/CAB7nPqSU1=-3fqvz+ncgm5vmro4lszjgqsmoygwizs2hvpy...@mail.gmail.com :-) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: patch to allow disable of WAL recycling
Jerry, On 2019-Mar-05, Jerry Jelinek wrote: > Thanks again for your review. I went through your proposed patch diffs and > applied most of them to my original changes. I did a few things slightly > differently since I wanted to keep to to 80 columns for the source code, > but I can revisit that if it is not an issue. Yeah, in the places where I exceeded the limit, it is largely considered not an issue. Brace curling *is* an issue, though :-) I would prefer to go with my version (which is largely just stylistic changes over yours), applying your subsequent changes on top of that. I can't remember now if I pgindented it or just changed manually ... I should do that. > also cleaned up the confusing wording around "allocating blocks". I > ran a clean build and make check passes. The new patch is attached. Cool. Can you confirm that it still fixes the performance issue for you? (It should, since functionally it should be the same thing as yours.) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On 2019-Mar-04, Robert Haas wrote: > On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada > wrote: > > === Discussion points === > > > > - Progress counter for "3. sorting tuples" phase > > - Should we add pgstat_progress_update_param() in tuplesort.c like a > >"trace_sort"? > >Thanks to Peter Geoghegan for the useful advice! > > How would we avoid an abstraction violation? The theory embodied in my patch at https://postgr.es/m/20190304204607.GA15946@alvherre.pgsql is that we don't; tuplesort.c functions (index.c's IndexBuildHeapScan in my case) would get a boolean parameter to indicate whether to update some params or not -- the param number(s) to update are supposed to be generic in the sense that it's not part of any individual command's implementation (PROGRESS_SCAN_BLOCKS_DONE for what you call "blks scanned", PROGRESS_SCAN_BLOCKS_TOTAL for "blks total"), but rather defined by the "progress update provider" (index.c or tuplesort.c). One, err, small issue with that idea is that we need the param numbers not to conflict for any "progress update providers" that are to be used simultaneously by any command. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Delay locking partitions during query execution
On Wed, 6 Mar 2019 at 04:46, Tomas Vondra wrote: > > On 3/5/19 6:55 AM, David Rowley wrote: > > The only way I can think to fix this is to just never lock partitions > > at all, and if a lock is to be obtained on a partition, it must be > > instead obtained on the top-level partitioned table. That's a rather > > large change that could have large consequences, and I'm not even sure > > it's possible since we'd need to find the top-level parent before > > obtaining the lock, by then the hierarchy might have changed and we'd > > need to recheck, which seems like quite a lot of effort just to obtain > > a lock... Apart from that, it's not this patch, so looks like I'll > > need to withdraw this one :-( > > > > So you're saying we could > > 1) lookup the parent and lock it > 2) repeat the lookup to verify it did not change > > I think that could still be a win, assuming that most hierarchies will > be rather shallow (I'd say 2-3 levels will cover like 95% of cases, and > 4 levels would be 100% in practice). And the second lookup should be > fairly cheap thanks to syscache and the fact that the hierarchies do not > change very often. Actually, I'm not sure it could work at all. It does not seem very safe to lookup a partition's parent without actually holding a lock on the partition and we can't lock the partition and then lock each parent in turn as that's the exact opposite locking order that we do when querying a partitioned table, so could result in deadlocking. Many moons ago in the 0002 patch in [1] I proposed to change the way we store partition hierarchies which involved ditching bool pg_class.relispartition in favour of Oid pg_class.relpartitionparent. That would make parent lookups pretty fast, but... given the above it seems like we couldn't do this at all. [1] https://www.postgresql.org/message-id/CAKJS1f9QjUwQrio20Pi%3DyCHmnouf4z3SfN8sqXaAcwREG6k0zQ%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
fix for BUG #3720: wrong results at using ltree
Hi all, Here is my attempt to fix a 12-years old ltree bug (which is a todo item). I see it's not backward-compatible, but in my understanding that's what is documented. Previous behavior was inconsistent with documentation (where single asterisk should match zero or more labels). http://archives.postgresql.org/pgsql-bugs/2007-11/msg00044.php diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out index 8226930905..b1878a99b3 100644 --- a/contrib/ltree/expected/ltree.out +++ b/contrib/ltree/expected/ltree.out @@ -676,7 +676,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.a.*.d.*'; SELECT 'a.b.c.d.e'::ltree ~ '*.!d.*'; ?column? -- - f + t (1 row) SELECT 'a.b.c.d.e'::ltree ~ '*.!d'; @@ -706,7 +706,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.!e'; SELECT 'a.b.c.d.e'::ltree ~ '*.!e.*'; ?column? -- - f + t (1 row) SELECT 'a.b.c.d.e'::ltree ~ 'a.*.!e'; @@ -724,7 +724,7 @@ SELECT 'a.b.c.d.e'::ltree ~ 'a.*.!d'; SELECT 'a.b.c.d.e'::ltree ~ 'a.*.!d.*'; ?column? -- - f + t (1 row) SELECT 'a.b.c.d.e'::ltree ~ 'a.*.!f.*'; @@ -742,7 +742,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.a.*.!f.*'; SELECT 'a.b.c.d.e'::ltree ~ '*.a.*.!d.*'; ?column? -- - f + t (1 row) SELECT 'a.b.c.d.e'::ltree ~ '*.a.!d.*'; @@ -766,13 +766,13 @@ SELECT 'a.b.c.d.e'::ltree ~ 'a.!d.*'; SELECT 'a.b.c.d.e'::ltree ~ '*.a.*.!d.*'; ?column? -- - f + t (1 row) SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*'; ?column? -- - f + t (1 row) SELECT 'a.b.c.d.e'::ltree ~ '*.!b.c.*'; @@ -784,7 +784,7 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.!b.c.*'; SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*.c.*'; ?column? -- - f + t (1 row) SELECT 'a.b.c.d.e'::ltree ~ '!b.*.c.*'; @@ -832,31 +832,31 @@ SELECT 'a.b.c.d.e'::ltree ~ '*{1}.!b.*.!c.*.e'; SELECT 'a.b.c.d.e'::ltree ~ '*{1}.!b.*{1}.!c.*.e'; ?column? -- - t + f (1 row) SELECT 'a.b.c.d.e'::ltree ~ 'a.!b.*{1}.!c.*.e'; ?column? -- - t + f (1 row) SELECT 'a.b.c.d.e'::ltree ~ '!b.*{1}.!c.*.e'; ?column? -- - t + f (1 row) SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*{1}.!c.*.e'; ?column? -- - t + f (1 row) SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*.!c.*.e'; ?column? -- - f + t (1 row) SELECT 'a.b.c.d.e'::ltree ~ '!b.!c.*'; @@ -886,19 +886,19 @@ SELECT 'a.b.c.d.e'::ltree ~ '*{1}.!b.*.!c.*'; SELECT 'a.b.c.d.e'::ltree ~ '*{1}.!b.*{1}.!c.*'; ?column? -- - t + f (1 row) SELECT 'a.b.c.d.e'::ltree ~ 'a.!b.*{1}.!c.*'; ?column? -- - t + f (1 row) SELECT 'a.b.c.d.e'::ltree ~ '!b.*{1}.!c.*'; ?column? -- - t + f (1 row) SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*{1}.!c.*'; @@ -909,6 +909,18 @@ SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*{1}.!c.*'; SELECT 'a.b.c.d.e'::ltree ~ '*.!b.*.!c.*'; ?column? +-- + t +(1 row) + +SELECT '5.0.1.0'::ltree ~ '5.!0.!0.0'; + ?column? +-- + f +(1 row) + +SELECT 'a.b'::ltree ~ '!a.!a'; + ?column? -- f (1 row) diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c index b6d2deb1af..392c193fd8 100644 --- a/contrib/ltree/lquery_op.c +++ b/contrib/ltree/lquery_op.c @@ -19,16 +19,6 @@ PG_FUNCTION_INFO_V1(lt_q_rregex); #define NEXTVAL(x) ( (lquery*)( (char*)(x) + INTALIGN( VARSIZE(x) ) ) ) -typedef struct -{ - lquery_level *q; - int nq; - ltree_level *t; - int nt; - int posq; - int post; -} FieldNot; - static char * getlexeme(char *start, char *end, int *len) { @@ -50,7 +40,7 @@ getlexeme(char *start, char *end, int *len) } bool - compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *, const char *, size_t), bool anyend) +compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *, const char *, size_t), bool anyend) { char *endt = t->name + t->len; char *endq = qn + len; @@ -108,6 +98,9 @@ checkLevel(lquery_level *curq, ltree_level *curt) int (*cmpptr) (const char *, const char *, size_t); lquery_variant *curvar = LQL_FIRST(curq); int i; + boolsuccess; + + success = (curq->flag & LQL_NOT) ? false : true; for (i = 0; i < curq->numvar; i++) { @@ -115,8 +108,9 @@ checkLevel(lquery_level *curq, ltree_level *curt) if (curvar->flag & LVAR_SUBLEXEME) { - if (compare_subnode(curt, curvar->name, curvar->len, cmpptr, (curvar->flag & LVAR_ANYEND))) -return true; + if (compare_subnode(curt, curvar->name, curvar->len, cmpptr, + (curvar->flag & LVAR_ANYEND))) +return success; } else if ( ( @@ -126,22 +120,12 @@ checkLevel(lquery_level *curq, ltree_level *curt) (*cmpptr) (curvar->name, curt->name, curvar->len) == 0) { - return true; + return success; } curvar = LVAR_NEXT(curvar); } - return false; -} - -/* -void -printFieldNot(FieldNot *fn ) { - while(fn->q) { - elog(NOTICE,"posQ:%d lenQ:%d posT:%d lenT:%d", fn->posq,fn->nq,fn->post,fn->nt); - fn++; - } + return !success; } -*/ static struct { @@ -154,7 +138,7 @@ static struct };
Re: dropdb --force
Here is Pavel's patch rebased to master branch, added the dropdb --force option, a test case & documentation. I'm willing to work on it if needed. What are possible bad things that could happen here? Is the documentation clear enough? Thanks. On Tue, Dec 18, 2018 at 4:34 PM Marti Raudsepp wrote: > > Hi > > > út 18. 12. 2018 v 16:11 odesílatel Filip Rembiałkowski > > napsal: > >> Please share opinions if this makes sense at all, and has any chance > >> going upstream. > > Clearly since Pavel has another implementation of the same concept, > there is some interest in this feature. :) > > On Tue, Dec 18, 2018 at 5:20 PM Pavel Stehule wrote: > > Still one my customer use a patch that implement FORCE on SQL level. It is > > necessary under higher load when is not easy to synchronize clients. > > I think Filip's approach of setting pg_database.datallowconn='false' > is pretty clever to avoid the synchronization problem. But it's also a > good idea to expose this functionality via DROP DATABASE in SQL, like > Pavel's patch, not just the 'dropdb' binary. > > If this is to be accepted into PostgreSQL core, I think the two > approaches should be combined on the server side. > > Regards, > Marti Raudsepp diff --git a/doc/src/sgml/ref/drop_database.sgml b/doc/src/sgml/ref/drop_database.sgml index 3ac06c984a..84df485e11 100644 --- a/doc/src/sgml/ref/drop_database.sgml +++ b/doc/src/sgml/ref/drop_database.sgml @@ -21,7 +21,7 @@ PostgreSQL documentation -DROP DATABASE [ IF EXISTS ] name +DROP DATABASE [ IF EXISTS ] name [ FORCE ] @@ -32,9 +32,11 @@ DROP DATABASE [ IF EXISTS ] name DROP DATABASE drops a database. It removes the catalog entries for the database and deletes the directory containing the data. It can only be executed by the database owner. - Also, it cannot be executed while you or anyone else are connected - to the target database. (Connect to postgres or any - other database to issue this command.) + Also, it cannot be executed while you are connected to the target database. + (Connect to postgres or any other database to issue this + command.) + If anyone else is connected to the target database, this command will fail + - unless you use the FORCE option described below. @@ -64,6 +66,24 @@ DROP DATABASE [ IF EXISTS ] name + + +FORCE + + + Attempt to terminate all existing connections to the target database. + + + This will fail, if current user has no permissions to terminate other + connections. Required permissions are the same as with + pg_terminate_backend, described + in . + + This will also fail, if the connections do not terminate in 60 seconds. + + + + diff --git a/doc/src/sgml/ref/dropdb.sgml b/doc/src/sgml/ref/dropdb.sgml index 38f38f01ce..365ba317c1 100644 --- a/doc/src/sgml/ref/dropdb.sgml +++ b/doc/src/sgml/ref/dropdb.sgml @@ -86,6 +86,20 @@ PostgreSQL documentation + + -f + --force + + +Force termination of connected backends before removing the database. + + +This will add the FORCE option to the DROP +DATABASE command sent to the server. + + + + -V --version diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index d207cd899f..2137bee3a7 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -487,7 +487,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) * potential waiting; we may as well throw an error first if we're gonna * throw one. */ - if (CountOtherDBBackends(src_dboid, , )) + if (CountOtherDBBackends(src_dboid, , , false)) ereport(ERROR, (errcode(ERRCODE_OBJECT_IN_USE), errmsg("source database \"%s\" is being accessed by other users", @@ -777,7 +777,7 @@ createdb_failure_callback(int code, Datum arg) * DROP DATABASE */ void -dropdb(const char *dbname, bool missing_ok) +dropdb(const char *dbname, bool missing_ok, bool force) { Oid db_id; bool db_istemplate; @@ -785,6 +785,7 @@ dropdb(const char *dbname, bool missing_ok) HeapTuple tup; int notherbackends; int npreparedxacts; + int loops = 0; int nslots, nslots_active; int nsubscriptions; @@ -868,12 +869,33 @@ dropdb(const char *dbname, bool missing_ok) * * As in CREATE DATABASE, check this after other error conditions. */ - if (CountOtherDBBackends(db_id, , )) - ereport(ERROR, -(errcode(ERRCODE_OBJECT_IN_USE), - errmsg("database \"%s\" is being accessed by other users", - dbname), - errdetail_busy_db(notherbackends, npreparedxacts))); + for (;;) + { + /* + * CountOtherDBBackends check usage of database by other backends and try + * to wait 5 sec. We try to raise warning after 1 minute and and raise + * a error after 5 minutes. + */ + if (!CountOtherDBBackends(db_id, , , force)) +
Re: Ordered Partitioned Table Scans
On Wed, 6 Mar 2019 at 07:17, Robert Haas wrote: > > On Wed, Dec 19, 2018 at 5:08 PM David Rowley > wrote: > > With my idea for using live_parts, we'll process the partitions > > looking for interleaved values on each query, after pruning takes > > place. In this case, we'll see the partitions are naturally ordered. I > > don't really foresee any issues with that additional processing since > > it will only be a big effort when there are a large number of > > partitions, and in those cases the planner already has lots of work to > > do. Such processing is just a drop in the ocean when compared to path > > generation for all those partitions. > > I agree that partitions_are_ordered() is cheap enough in this patch > that it probably doesn't matter whether we cache the result. On the > other hand, that's mostly because you haven't handled the hard cases - > e.g. interleaved list partitions. If you did, then it would be > expensive, and it probably *would* be worth caching the result. Now > maybe those hard cases aren't worth handling anyway. I admit that I didn't understand the idea of the flag at the time, having failed to see the point of it since if partitions are plan-time pruned then I had thought the flag would be useless. However, as Julien explained, it would be a flag of "Yes" means "Yes", okay to do ordered scans, and "No" means "Recheck if there are pruned partitions using only the non-pruned ones". That seems fine and very sane to me now that I understand it. FWIW, my moment of realisation came in [1]. However, my thoughts are that adding new flags and the live_parts field in RelOptInfo raise the bar a bit for this patch. There's already quite a number of partition-related fields in RelOptInfo. Understanding what each of those does is not trivial, so I figured that this patch would be much easier to consider if I skipped that part for the first cut version. I feared a lot of instability of what fields exist from Amit's planner improvement patches and I didn't want to deal with dependencies from WIP. I had to deal with that last year on run-time pruning and it turned out not to be fun. > You also seem to be saying that since we run-time partitioning pruning > might change the answer, caching the initial answer is pointless. But > I think Julien has made a good argument for why that's wrong: if the > initial answer is that the partitions are ordered, which will often be > true, then we can skip all later checks. > > So I am OK with the fact that this patch doesn't choose to cache it, > but I don't really buy any of your arguments for why it would be a bad > idea. OK, good. I agree. For the record; I want to steer clear of the flag in this first cut version, especially so now given what time it is. [1] https://www.postgresql.org/message-id/cakjs1f_r51oapsn1oc4i36d7vznnihnk+1widfg0qrvb_eo...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
On Tue, Mar 5, 2019 at 3:46 AM David Steele wrote: > I have marked this entry as targeting PG13 since it is too late to > consider for PG12. Sounds right. As Peter said himself, this patch is WIP, so it's too soon to consider integrating it. This is also fairly evident from the content of the patch, which is full of comments marked XXX and PEMOSER that obviously need to be addressed somehow. For all of that, I'd say this patch is much closer to being on the right track than the old design, even though it's clear that a lot of work remains. Some desultory review comments: +#define setSweepline(datum) \ + node->sweepline = datumCopy(datum, node->datumFormat->attbyval, node->datumFormat->attlen) + +#define freeSweepline() \ + if (! node->datumFormat->attbyval) pfree(DatumGetPointer(node->sweepline)) I am quite dubious about this. Almost everywhere in the executor we rely on slots to keep track of tuples and free memory for us. It seems unlikely that this should be the one place where we have code that looks completely different. Aside from that, this seems to propose there is only one relevant column, which seems like an assumption that we probably don't want to bake too deeply into the code. + ereport(ERROR, + (errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("Attribute \"%s\" at position %d is null. Temporal " \ + "adjustment not possible.", + NameStr(TupleDescAttr(slot->tts_tupleDescriptor, attnum - 1)->attname), + attnum))); This error message is marked as translatable (ereport used rather than elog) but the error-message text is unsuitable for a user-facing error. If this is a user-facing error, we need a better error, or maybe we need to rethink the semantics altogether (e.g. just skip such rows instead of erroring out, or something). If it's an internal error that should not be possible no matter what query the user enters, and is only here as a sanity test, just simplify and use elog (and maybe add some comments explaining why that's so). + * heapGetAttrNotNull I may be a bit behind the times here, but it seems to me that this is functionally equivalent to slotGetAttrNotNull and thus we shouldn't need both. + boolempty = false; Not much point in declaring a variable whose value is never changed and whose value takes up exactly the same number of characters as the variable name. + * temporalAdjustmentStoreTuple + * While we store result tuples, we must add the newly calculated temporal + * boundaries as two scalar fields or create a single range-typed field + * with the two given boundaries. This doesn't seem to describe what the function is actually doing. + * This should ideally be done with RangeBound types on the right-hand-side + * created during range_split execution. Otherwise, we loose information about + * inclusive/exclusive bounds and infinity. We would need to implement btree + * operators for RangeBounds. This seems like an idea for future improvement, but it's unclear to me how the proposed idea is different from the state created by the patch. Also, materializing the slot to a heap tuple so that we can modify it seems inefficient. I wonder if we can't use a virtual slot here. + if (qual->opno == OID_RANGE_EQ_OP) { + Oid rngtypid; + + // XXX PEMOSER Change opfamily and opfunc + qual->opfuncid = F_RANGE_CONTAINS; //<<--- opfuncid can be 0 during planning + qual->opno = OID_RANGE_CONTAINS_ELEM_OP; //OID_RANGE_CONTAINS_OP; + clause->isnormalize = true; + + // Attention: cannot merge using non-equality operator 3890 <--- OID_RANGE_CONTAINS_OP + opfamily = 4103; //range_inclusion_ops from pg_opfamily.h + + rngtypid = exprType((Node*)clause->lexpr->expr); + clause->range_typcache = lookup_type_cache(rngtypid, TYPECACHE_RANGE_INFO); + testmytypcache = clause->range_typcache; + } else { + clause->isnormalize = false; + } This is pretty obviously a mess of hardcoded constants which are, furthermore, not explained. I can't tell whether this is intended as a dirty hack to make some cases work while other things remain broken, or whether you believe that OID_RANGE_EQ_OP. If it's the latter, this needs a much better explanation. You probably realize that "Attention: cannot merge using non-equality operator 3890" is not a compelling justification, and that hard-coding all of these things here doesn't look good. In general, this patch needs both user-facing documentation and more and better code comments. I would suggest writing the user-facing documentation soon. It is pretty clear that you've got the basics of this working, but it's impossible to understand what the semantics are supposed to be except by staring at the code until you figure it out, or running experiments. People who are interested in this functionality are more likely to provide useful feedback if there's something they can read and go "yeah, that looks right" or "wait, that sounds wrong." Also, there may be places where, in the process of documenting, you realize that things
Re: NOT IN subquery optimization
On Wed, 6 Mar 2019 at 12:25, David Rowley wrote: > That sounds fine. I'll take mine elsewhere since I didn't start this thread. Moved to https://www.postgresql.org/message-id/CAKJS1f82pqjqe3WT9_xREmXyG20aOkHc-XqkKZG_yMA7JVJ3Tw%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Allowing extensions to supply operator-/function-specific info
Paul Ramsey writes: > On Mar 5, 2019, at 3:26 PM, Tom Lane wrote: >> Hm, I think your addition of this bit is wrong: >> >> +/* >> +* Arguments were swapped to put the index value on the >> +* left, so we need the commutated operator for >> +* the OpExpr >> +*/ >> +if (swapped) >> +{ >> +oproid = get_commutator(oproid); >> +if (!OidIsValid(oproid)) >> PG_RETURN_POINTER((Node *)NULL); >> +} >> >> We already did the operator lookup with the argument types in the desired >> order, so this is introducing an extra swap. The only reason it appears >> to work, I suspect, is that all your index operators are self-commutators. > I was getting regression failures until I re-swapped the operator… > SELEcT * FROM foobar WHERE ST_Within(ConstA, VarB) Ah ... so the real problem here is that *not* all of your functions treat their first two inputs alike, and the hypothetical future improvement I commented about is needed right now. I should've looked more closely at the strategies in your table; then I would've realized the patch as I proposed it didn't work. But this code isn't right either. I'm surprised you're not getting crashes --- perhaps there aren't cases where the first and second args are of incompatible types? Also, it's certainly wrong to be doing this sort of swap in only one of the two code paths. There's more than one way you could handle this, but the way that I was vaguely imagining was to have two strategy entries in each IndexableFunction entry, one to apply if the first function argument is the indexable one, and the other to apply if the second function argument is the indexable one. If you leave the operator lookup as I had it (using the already-swapped data types) then you'd have to make sure that the latter set of strategy entries are written as if the arguments get swapped before selecting the strategy, which would be confusing perhaps :-( --- for instance, st_within would use RTContainedByStrategyNumber in the first case but RTContainsStrategyNumber in the second. But otherwise you need the separate get_commutator step, which seems like one more catalog lookup than you really need. > I feel OK about it, if for no other reason than it passes all the tests :) Then you're at least missing adequate tests for the 3-arg functions... 3 args with the index column second will not work as this stands. regards, tom lane
Converting NOT IN to anti-joins during planning
Way back in [1] I proposed that we allow NOT IN subqueries to be converted into an anti-join where the subquery cannot return any NULL values. As Tom pointed out to me, I had neglected to consider that the outer side producing NULLs can cause the anti-join plan to produce incorrect results. The difference is that a NOT IN where the subquery returns no results filters nothing, otherwise it filters the nulls, plus the records that exist in the subquery. More recently over on [2], Jim and Zheng have re-proposed making improvements in this area. Their ideas are slightly different from mine as they propose to add an OR .. IS NULL clause to the join condition to handle the outer side being NULL with empty subquery problem. Before Jim and Zheng's patch arrived I managed to fix the known problems with my 4-year-old patch thinking it would have been welcome, but it seems that's not the case, perhaps due to the differing ideas we have on how this should work. At that time I didn't think the other patch actually existed yet... oops Anyway, I don't really want to drop my patch as I believe what it does is correct and there's debate on the other thread about how good an idea adding these OR clauses to the join quals is... (forces nested loop plan (see [3])), but it appears Jim and Zheng are fairly set on that idea. Hence... I'm moving my patch here, so it can be debated without interfering with the other work that's going on in this area. There has also been some review of my patch in [4], and of course, originally in [1]. The background is really. 1. Seems fine to do this transformation when there are no nulls. 2. We don't want to cost anything to decide on to do the transformation or not, i.e do it regardless, in all possible cases where it's valid to do so. We already do that for NOT EXISTS, no apparent reason to think this case is any different. 3. Need to consider what planner overhead there is from doing this and failing to do the conversion due lack of evidence for no NULLs. I've not done #3, at least not with the latest patch. There's already a CF entry [5] for this patch, although its targeting PG13. The latest patch is attached. [1] https://www.postgresql.org/message-id/CAApHDvqRB-iFBy68%3DdCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/1550706289606-0.p...@n3.nabble.com [3] https://www.postgresql.org/message-id/CAKJS1f_ZwXtzPz6wDpBXgAVYuxforsqpc6hBw05Y6aPGcOONfA%40mail.gmail.com [4] https://www.postgresql.org/message-id/18203.1551543939%40sss.pgh.pa.us [5] https://commitfest.postgresql.org/22/2020/ -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services not_in_anti_join_v1.3.patch Description: Binary data
Re: Allowing extensions to supply operator-/function-specific info
> On Mar 5, 2019, at 3:26 PM, Tom Lane wrote: > > Paul Ramsey writes: >> Thanks for the patch, I’ve applied and smoothed and taken your advice on >> schema-qualified lookups as well. > > Hm, I think your addition of this bit is wrong: > > +/* > +* Arguments were swapped to put the index value on the > +* left, so we need the commutated operator for > +* the OpExpr > +*/ > +if (swapped) > +{ > +oproid = get_commutator(oproid); > +if (!OidIsValid(oproid)) > PG_RETURN_POINTER((Node *)NULL); > +} > > We already did the operator lookup with the argument types in the desired > order, so this is introducing an extra swap. The only reason it appears > to work, I suspect, is that all your index operators are self-commutators. I was getting regression failures until I re-swapped the operator… SELEcT * FROM foobar WHERE ST_Within(ConstA, VarB) Place the indexed operator in the Left, now: Left == VarB Right == ConstA Strategy == Within get_opfamily_member(opfamilyoid, Left, Right, Within) Unless we change the strategy number when we assign the left/right we’re looking up an operator for “B within A”, so we’re backwards. I feel OK about it, if for no other reason than it passes all the tests :) P
Re: Allowing extensions to supply operator-/function-specific info
Paul Ramsey writes: > Thanks for the patch, I’ve applied and smoothed and taken your advice on > schema-qualified lookups as well. Hm, I think your addition of this bit is wrong: +/* +* Arguments were swapped to put the index value on the +* left, so we need the commutated operator for +* the OpExpr +*/ +if (swapped) +{ +oproid = get_commutator(oproid); +if (!OidIsValid(oproid)) PG_RETURN_POINTER((Node *)NULL); +} We already did the operator lookup with the argument types in the desired order, so this is introducing an extra swap. The only reason it appears to work, I suspect, is that all your index operators are self-commutators. regards, tom lane
Re: NOT IN subquery optimization
On Wed, 6 Mar 2019 at 03:37, Tom Lane wrote: > > David Steele writes: > > I'm not sure if I have an issue with competing patches on the same > > thread. I've seen that before and it can lead to a good outcome. It > > case, as you say, also lead to confusion. > > It's a bit of a shame that the cfbot will only be testing one of them > at a time if we leave it like this. I kind of lean towards the > two-thread, two-CF-entry approach because of that. The amount of > confusion is a constant. That sounds fine. I'll take mine elsewhere since I didn't start this thread. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: using index or check in ALTER TABLE SET NOT NULL
On Wed, 6 Mar 2019 at 08:41, Sergei Kornilov wrote: > In this case we need extra argument for ConstraintImpliedByRelConstraint for > some caller-provided existConstraint, right? Along with Relation itself? Then > I need make copy of existConstraint, append relation constraints and call > predicate_implied_by. If I understood correctly pseudocode would be: > > PartConstraintImpliedByRelConstraint(rel, testConstraint) > generate notNullConstraint from NOT NULL table attributes > call ConstraintImpliedByRelConstraint(rel, testConstraint, > notNullConstraint) > > ConstraintImpliedByRelConstraint(rel, testConstraint, existsConstraint) > copy existsConstraint to localExistsConstraint variable > append relation constraints to localExistsConstraint list > call predicate_implied_by > > I thing redundant IS NOT NULL check is not issue here and we not need extra > arguments for ConstraintImpliedByRelConstraint. Was not changed in attached > version, but I will change if you think this would be better design. I was really just throwing the idea out there for review. I don't think that I'm insisting on it. FWIW I think you could get away without the copy of the constraint providing it was documented that the list is modified during the ConstraintImpliedByRelConstraint call and callers must make a copy themselves if they need an unmodified version. Certainly, PartConstraintImpliedByRelConstraint won't need to make a copy, so there's probably not much reason to assume that possible future callers will. Providing I'm imagining it correctly, I do think the patch is slightly cleaner with that change. It's: a) slightly more efficient due to not needlessly checking a bunch of IS NOT NULLs (imagine a 1000 column table with almost all NOT NULL and a single CHECK constraint); and b) patch has a smaller footprint (does not modify existing callers of PartConstraintImpliedByRelConstraint()); and c) does not modify an existing API function. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Should we increase the default vacuum_cost_limit?
Thanks for chipping in on this. On Wed, 6 Mar 2019 at 01:53, Tomas Vondra wrote: > But on the other hand it feels a bit weird that we increase this one > value and leave all the other (also very conservative) defaults alone. Which others did you have in mind? Like work_mem, shared_buffers? If so, I mentioned in the initial post that I don't see vacuum_cost_limit as in the same category as those. It's not like PostgreSQL won't start on a tiny server if vacuum_cost_limit is too high, but you will have issues with too big a shared_buffers, for example. I think if we insist that this patch is a review of all the "how big is your server" GUCs then that's raising the bar significantly and unnecessarily for what I'm proposing here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: patch to allow disable of WAL recycling
Alvaro, Thanks again for your review. I went through your proposed patch diffs and applied most of them to my original changes. I did a few things slightly differently since I wanted to keep to to 80 columns for the source code, but I can revisit that if it is not an issue. I also cleaned up the confusing wording around "allocating blocks". I ran a clean build and make check passes. The new patch is attached. Thanks, Jerry On Wed, Feb 27, 2019 at 4:12 PM Alvaro Herrera wrote: > On 2019-Feb-05, Jerry Jelinek wrote: > > > First, since last fall, we have found another performance problem related > > to initializing WAL files. I've described this issue in more detail > below, > > but in order to handle this new problem, I decided to generalize the > patch > > so the tunable refers to running on a Copy-On-Write filesystem instead of > > just being specific to WAL recycling. Specifically, I renamed the GUC > > tunable from 'wal_recycle' to 'wal_cow_fs'. Hopefully this will make it > > more obvious what is being tuned and will also be more flexible if there > > are other problems in the future which are related to running on a COW > > filesystem. I'm happy to choose a different name for the tunable if > people > > don't like 'wal_cow_fs'. > > I think the idea of it being a generic tunable for assorted behavior > changes, rather than specific to WAL recycling, is a good one. I'm > unsure about your proposed name -- maybe "wal_cow_filesystem" is better? > > I'm rewording your doc addition a little bit. Here's my proposal: > > > This parameter should only be set to on when > the WAL > resides on a Copy-On-Write > (COW) > filesystem. > Enabling this option adjusts behavior to take advantage of the > filesystem characteristics (for example, recycling WAL files and > zero-filling new WAL files are disabled). > > This part sounds good enough to me -- further suggestions welcome. > > I'm less sure about this phrase: > > This setting is only appropriate for filesystems which > allocate new disk blocks on every write. > > Is "... which allocate new disk blocks on every write" a technique > distinct from CoW itself? I'm confused as to what it means, or how can > the user tell whether they are on such a filesystem. > > Obviously you're thinking that ZFS is such a filesystem and everybody > who has pg_wal on ZFS should enable this option. What about, say, Btrfs > -- should they turn this option on? Browsing the wikipedia, I find that > Windows has this ReFS thing that apparently is also CoW, but NTFS isn't. > I don't think either Btrfs or ReFS are realistic options to put pg_wal > on, so let's just list the common filesystems for which users are > supposed to enable this option ... which I think nowadays is just ZFS. > All in all, I would replace this phrase with something like: "This > setting should be enabled when pg_wal resides on a ZFS filesystem or > similar." That should be weasely enough that it's clear that we expect > users to do the homework when on unusual systems, while actively pointing > out the most common use case. > > > Finally, the patch now includes bypassing the zero-fill for new WAL files > > when wal_cow_fs is true. > > That makes sense. I think all these benchmarks Tomas Vondra run are not > valid anymore ... > > The attached v2 has assorted cosmetic cleanups. If you can validate it, > I would appreciate it. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > 0001-cow-filesystem.patch Description: Binary data
Re: [HACKERS] Incomplete startup packet errors
On 3/4/19 7:42 AM, Christoph Berg wrote: > Re: Andrew Dunstan 2019-03-04 > <7cc6d2c1-bd87-9890-259d-36739c247...@2ndquadrant.com> >> Looks good to me. > +1. > OK, I think we have agreement on Tom's patch. Do we want to backpatch it? It's a change in behaviour, but I find it hard to believe anyone relies on the existence of these annoying messages, so my vote would be to backpatch it. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: A separate table level option to control compression
On 2/6/19 2:32 AM, Pavan Deolasee wrote: > Hello, > > Currently either the table level option `toast_tuple_target` or the > compile time default `TOAST_TUPLE_TARGET` is used to decide whether a > new tuple should be compressed or not. While this works reasonably > well for most situations, at times the user may not want to pay the > overhead of toasting, yet take benefits of inline compression. > > I would like to propose a new table level option, > compress_tuple_target, which can be set independently of > toast_tuple_target, and is checked while deciding whether to compress > the new tuple or not. > > For example, > > CREATE TABLE compresstest250 (a int, b text) WITH > (compress_tuple_target = 250); > CREATE TABLE compresstest2040 (a int, b text) WITH > (compress_tuple_target = 2040); > > -- shouldn't get compressed nor toasted > INSERT INTO compresstest250 VALUES (1, repeat('1234567890',20)); > > -- should get compressed, but not toasted > INSERT INTO compresstest250 VALUES (2, repeat('1234567890',30)); > > -- shouldn't get compressed nor toasted > INSERT INTO compresstest2040 VALUES (1, repeat('1234567890',20)); > INSERT INTO compresstest2040 VALUES (2, repeat('1234567890',30)); > > Without this patch, the second INSERT will not compress the tuple > since its length is less than the toast threshold. With the patch and > after setting table level option, one can compress such tuples. > > The attached patch implements this idea. > This is a nice idea, and I'm a bit surprised it hasn't got more attention. The patch itself seems very simple and straightforward, although it could probably do with having several sets of eyeballs on it. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Should we increase the default vacuum_cost_limit?
On 2019-03-05 17:14:55 -0500, Andrew Dunstan wrote: > This patch is tiny, seems perfectly reasonable, and has plenty of > support. I'm going to commit it shortly unless there are last minute > objections. +1
Re: Should we increase the default vacuum_cost_limit?
On 2/25/19 8:38 AM, David Rowley wrote: > On Tue, 26 Feb 2019 at 02:06, Joe Conway wrote: >> On 2/25/19 1:17 AM, Peter Geoghegan wrote: >>> On Sun, Feb 24, 2019 at 9:42 PM David Rowley >>> wrote: The current default vacuum_cost_limit of 200 seems to be 15 years old and was added in f425b605f4e. Any supporters for raising the default? >>> I also think that the current default limit is far too conservative. >> I agree entirely. In my experience you are usually much better off if >> vacuum finishes quickly. Personally I think our default scale factors >> are horrible too, especially when there are tables with large numbers of >> rows. > Agreed that the scale factors are not perfect, but I don't think > changing them is as quite a no-brainer as the vacuum_cost_limit, so > the attached patch just does the vacuum_cost_limit. > > I decided to do the times by 10 option that I had mentioned Ensue > debate about that... > > I'll add this to the March commitfest and set the target version as PG12. > This patch is tiny, seems perfectly reasonable, and has plenty of support. I'm going to commit it shortly unless there are last minute objections. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Rare SSL failures on eelpout
On Wed, Mar 6, 2019 at 9:21 AM Tom Lane wrote: > The bug #15598 report is more troublesome, as we don't have a strong > reason to believe it's not common on Windows. However, I wonder whether > we can really do anything at all about that one. If I understand what > Andrew was hypothesizing in that thread, it was that Windows might be > dropping undelivered data on the floor once the server closes its end > of the connection. That would be totally broken behavior, but I never > expect anything else from Microsoft :-(. If that is an accurate theory > then rewriting libpq won't fix it. Here is a stupid Python 2.7 program to try to test that. Run one copy of it like this: $ python ./test.py --server The server will wait for a client, send a message immediately, and then close the socket after a second. The client will connect and send something once before and twice after the server closes the socket, and finally see if it can read the message from the server. Here's the output I get from the client on some different systems (the server was running on the same system): $ uname -a Linux debian 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23) x86_64 GNU/Linux $ python ./test.py --client Sending A... 2 Sending B... [Errno 104] Connection reset by peer Sending C... [Errno 32] Broken pipe This is the server saying goodbye $ uname -a Darwin macaque.local 18.2.0 Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64 x86_64 $ python2.7 ./test.py --client Sending A... 2 Sending B... [Errno 32] Broken pipe Sending C... [Errno 32] Broken pipe This is the server saying goodbye $ uname -a FreeBSD dogmatix 13.0-CURRENT FreeBSD 13.0-CURRENT c0873ea614a(master) GENERIC amd64 $ python2.7 ./test.py --client Sending A... 2 Sending B... 2 Sending C... 2 This is the server saying goodbye So... can anyone tell us what happens on Windows? (A secondary question might be what happens if the server and client are on different machines since I guess it could be different?) -- Thomas Munro https://enterprisedb.com import socket import sys import time address = ("localhost", ) def server(): ss = socket.socket(socket.AF_INET, socket.SOCK_STREAM) ss.bind(address) ss.listen(5) (s, other_address) = ss.accept() # Send a message, but then close the socket shortly after. # Will the client manage to read the goodbye message, if it tries to write # first? s.send("This is the server saying goodbye\n") time.sleep(1) s.close() def client(): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(address) # Try to send a message immediately. At this point we expect the server # end to be waiting for a second before closing, so it should be OK. try: print "Sending A..." sent = s.send("A\n") print sent except Exception, e: # (Should not be reached) print e # Wait until after the server has closed its socket, and try to send # another message. time.sleep(2) try: print "Sending B..." sent = s.send("B\n") print sent except Exception, e: # We expect an error either here or at the next write (?) print e # Send one more message, just to see if perhaps an error will be reported # here rather than earlier... try: print "Sending C..." sent = s.send("C\n") print sent except Exception, e: # What about now? print e # Can we read the goodbye message? print s.recv(1024) if __name__ == "__main__": if sys.argv[1] == "--server": server() elif sys.argv[1] == "--client": client()
Re: Patch to document base64 encoding
Hello Karl, Attached: doc_base64_v3.patch I'm ok with referencing the historical MIME RFC. "RFC2045 section 6.8" -> "RFC 2045 Section 6.8" you can link to the RFC directly with: https://tools.ietf.org/html/rfc2045#section-6.8;>RFC 2045 Section 6.8 -- Fabien.
Re: ATTACH/DETACH PARTITION CONCURRENTLY
On Thu, Feb 28, 2019 at 3:27 PM Robert Haas wrote: > I'm not currently aware of any remaining correctness issues with this > code, although certainly there may be some. There has been a certain > dearth of volunteers to review any of this. I do plan to poke at it a > bit to see whether it has any significant performance impact, but not > today. Today, did some performance testing. I created a table with 100 partitions and randomly selected rows from it using pgbench, with and without -M prepared. The results show a small regression, but I believe it's below the noise floor. Five minute test runs. with prepared queries master: tps = 10919.914458 (including connections establishing) tps = 10876.271217 (including connections establishing) tps = 10761.586160 (including connections establishing) concurrent-attach: tps = 10883.535582 (including connections establishing) tps = 10868.471805 (including connections establishing) tps = 10761.586160 (including connections establishing) with simple queries master: tps = 1486.120530 (including connections establishing) tps = 1486.797251 (including connections establishing) tps = 1494.129256 (including connections establishing) concurrent-attach: tps = 1481.774212 (including connections establishing) tps = 1472.159016 (including connections establishing) tps = 1476.444097 (including connections establishing) Looking at the total of the three results, that's about an 0.8% regression with simple queries and an 0.2% regression with prepared queries. Looking at the median, it's about 0.7% and 0.07%. Would anybody like to argue that's a reason not to commit these patches? Would anyone like to argue that there is any other reason not to commit these patches? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: bgwriter_lru_maxpages limits in PG 10 sample conf
On Tue, Mar 5, 2019 at 12:24:14AM -0300, Alvaro Herrera wrote: > On 2019-Mar-04, Bruce Momjian wrote: > > > On Thu, Feb 28, 2019 at 10:28:44AM +0300, Sergei Kornilov wrote: > > > Hello > > > > > > postgresql.conf.sample was changed recently in REL_10_STABLE (commit > > > ab1d9f066aee4f9b81abde6136771debe0191ae8) > > > So config will be changed in next minor release anyway. We have another > > > reason to not fix bgwriter_lru_maxpages comment? > > > > Frankly, I was surprised postgresql.conf.sample was changed in a back > > branch since it will cause people who diff $PGDATA/postgresql.conf with > > share/postgresql.conf.sample to see differences they didn't make. > > I think the set of people that execute diffs of their production conf > file against the sample file to be pretty small -- maybe even empty. If > you're really interested in knowing the changes you've done, you're more > likely to use a version control system on the file anyway. Well, if this is true, then we should all agree to backpatch to postgresql.conf.sample more often. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Fwd: [Issue] Can't recompile cube extension as PGXS, utils/float.h is not installed
-- Forwarded message - From: Siarhei Siniak Date: Tue, 5 Mar 2019 at 23:31 Subject: Re: [Issue] Can't recompile cube extension as PGXS, utils/float.h is not installed To: Tom Lane >AFAICT, that file only exists in HEAD, not in any released branch, and >it is installed during "make install" from HEAD. Please be sure you >are using installed files that match whatever branch you're trying >to build from. Yeah june and july 2018, a month in between. Just thought a release was not so long ago. ``` git log REL_11_BETA2..6bf0bc842bd75 --format=oneline -- | wc -l # 175 ``` Ok, then probably no more questions.
Re: Rare SSL failures on eelpout
Thomas Munro writes: > Bleugh. I think this OpenSSL package might just be buggy on ARM. On > x86, apparently the same version of OpenSSL and all other details of > the test the same, I can see that SSL_connect() returns <= 0 > (failure), and then we ask for that cert revoked message directly and > never even reach the startup packet sending code. On ARM, > SSL_connect() returns 1 (success) and then we proceed as discussed and > eventually get the error later (or not). So I think I should figure > out a minimal repro and report this to them. Yeah, I've still been unable to reproduce even with the sleep idea, so eelpout is definitely looking like a special snowflake from here. In any case, there seems little doubt that getting past SSL_connect() when the cert check has failed is an OpenSSL bug; I don't feel a need to create a workaround for it. The bug #15598 report is more troublesome, as we don't have a strong reason to believe it's not common on Windows. However, I wonder whether we can really do anything at all about that one. If I understand what Andrew was hypothesizing in that thread, it was that Windows might be dropping undelivered data on the floor once the server closes its end of the connection. That would be totally broken behavior, but I never expect anything else from Microsoft :-(. If that is an accurate theory then rewriting libpq won't fix it. I still think the redesign I suggested upthread would make things cleaner, but I don't have time or interest to make it happen in the near future if it's not fixing an observable bug. regards, tom lane
Re: using index or check in ALTER TABLE SET NOT NULL
Hello, David! > I've made a pass over v10. I think it's in pretty good shape, but I > did end up changing a few small things. Thank you! I merged your changes to new patch version. > The only thing that I'm a bit unsure of is the tests. I've read the > thread and I see the discussion above about it. I'd personally have > thought INFO was fine since ATTACH PARTITION does that, but I see > objections. It is unclear for me if we have consensus about INFO ereport in ATTACH PARTITION. > It appears that all the tests just assume that the CHECK > constraint was used to validate the SET NOT NULL. I'm not all that > sure if there's any good reason not to set client_min_messages = > 'debug1' just before your test then RESET client_min_messages; > afterwards. No other tests seem to do it, but my only thoughts on the > reason not to would be that it might fail if someone added another > debug somewhere, but so what? they should update the expected results > too. Not sure we have consensus about debug messages in tests, so I did such test changes in additional patch. > separate that part out and > put back the function named PartConstraintImpliedByRelConstraint and > have it do the IS NOT NULL building before calling > ConstraintImpliedByRelConstraint(). No duplicate code that way and you > also don't need to touch partbound.c In this case we need extra argument for ConstraintImpliedByRelConstraint for some caller-provided existConstraint, right? Along with Relation itself? Then I need make copy of existConstraint, append relation constraints and call predicate_implied_by. If I understood correctly pseudocode would be: PartConstraintImpliedByRelConstraint(rel, testConstraint) generate notNullConstraint from NOT NULL table attributes call ConstraintImpliedByRelConstraint(rel, testConstraint, notNullConstraint) ConstraintImpliedByRelConstraint(rel, testConstraint, existsConstraint) copy existsConstraint to localExistsConstraint variable append relation constraints to localExistsConstraint list call predicate_implied_by I thing redundant IS NOT NULL check is not issue here and we not need extra arguments for ConstraintImpliedByRelConstraint. Was not changed in attached version, but I will change if you think this would be better design. regards, Sergeidiff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml index 890b23afd6..926b3361ea 100644 --- a/doc/src/sgml/ref/alter_table.sgml +++ b/doc/src/sgml/ref/alter_table.sgml @@ -214,8 +214,17 @@ WITH ( MODULUS numeric_literal, REM These forms change whether a column is marked to allow null - values or to reject null values. You can only use SET - NOT NULL when the column contains no null values. + values or to reject null values. + + + + SET NOT NULL may only be applied to a column + providing none of the records in the table contain a + NULL value for the column. Ordinarily this is + checked during the ALTER TABLE by scanning the + entire table, however if a valid CHECK constraint is + found which proves no NULL can exist, then the + table scan is skipped. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index a93b13c2fe..b03e1bc875 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -158,7 +158,7 @@ typedef struct AlteredTableInfo /* Information saved by Phases 1/2 for Phase 3: */ List *constraints; /* List of NewConstraint */ List *newvals; /* List of NewColumnValue */ - bool new_notnull; /* T if we added new NOT NULL constraints */ + bool verify_new_notnull; /* T if we should recheck NOT NULL */ int rewrite; /* Reason for forced rewrite, if any */ Oid newTableSpace; /* new tablespace; 0 means no change */ bool chgPersistence; /* T if SET LOGGED/UNLOGGED is used */ @@ -370,6 +370,7 @@ static ObjectAddress ATExecDropNotNull(Relation rel, const char *colName, LOCKMO static void ATPrepSetNotNull(Relation rel, bool recurse, bool recursing); static ObjectAddress ATExecSetNotNull(AlteredTableInfo *tab, Relation rel, const char *colName, LOCKMODE lockmode); +static bool NotNullImpliedByRelConstraints(Relation rel, Form_pg_attribute attr); static ObjectAddress ATExecColumnDefault(Relation rel, const char *colName, Node *newDefault, LOCKMODE lockmode); static ObjectAddress ATExecAddIdentity(Relation rel, const char *colName, @@ -4506,10 +4507,11 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode) else { /* - * Test the current data within the table against new constraints - * generated by ALTER TABLE commands, but don't rebuild data. + * If required, test the current data within the table against new + * constraints generated by ALTER TABLE commands, but don't rebuild + * data. */ - if (tab->constraints != NIL || tab->new_notnull || + if (tab->constraints !=
Re: any plan to support shared servers like Oracle in PG?
There already are solutions regarding this feature in Postgres using "connection pooler" wording see pgpool: http://www.pgpool.net/mediawiki/index.php/Main_Page pgbouncer: https://pgbouncer.github.io/ there are also discussions to include this as a core feature https://www.postgresql.org/message-id/flat/4b971a8f-ff61-40eb-8f30-7b57eb0fdf9d%40postgrespro.ru -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Rare SSL failures on eelpout
On Wed, Mar 6, 2019 at 7:05 AM Thomas Munro wrote: > On Wed, Mar 6, 2019 at 6:07 AM Tom Lane wrote: > > Annoying. I'd be happier about writing code to fix this if I could > > reproduce it :-( > > Hmm. Note that eelpout only started doing it with OpenSSL 1.1.1. Bleugh. I think this OpenSSL package might just be buggy on ARM. On x86, apparently the same version of OpenSSL and all other details of the test the same, I can see that SSL_connect() returns <= 0 (failure), and then we ask for that cert revoked message directly and never even reach the startup packet sending code. On ARM, SSL_connect() returns 1 (success) and then we proceed as discussed and eventually get the error later (or not). So I think I should figure out a minimal repro and report this to them. -- Thomas Munro https://enterprisedb.com
Re: New vacuum option to do only freezing
On 3/5/19, 1:22 AM, "Masahiko Sawada" wrote: > On Tue, Mar 5, 2019 at 8:27 AM Bossart, Nathan wrote: >> + VACUUM removes dead tuples and prunes HOT-updated >> + tuples chain for live tuples on table. If the table has any dead tuple >> + it removes them from both table and indexes for re-use. With this >> + option VACUUM doesn't completely remove dead tuples >> + and disables removing dead tuples from indexes. This is suitable for >> + avoiding transaction ID wraparound (see >> + ) but not sufficient for >> avoiding >> + index bloat. This option is ignored if the table doesn't have index. >> + This cannot be used in conjunction with FULL >> + option. >> >> There are a couple of small changes I would make. How does something >> like this sound? >> >> VACUUM removes dead tuples and prunes HOT-updated tuple chains for >> live tuples on the table. If the table has any dead tuples, it >> removes them from both the table and its indexes and marks the >> corresponding line pointers as available for re-use. With this >> option, VACUUM still removes dead tuples from the table, but it >> does not process any indexes, and the line pointers are marked as >> dead instead of available for re-use. This is suitable for >> avoiding transaction ID wraparound (see Section 24.1.5) but not >> sufficient for avoiding index bloat. This option is ignored if >> the table does not have any indexes. This cannot be used in >> conjunction with the FULL option. > > Hmm, that's good idea but I'm not sure that user knows the word 'line > pointer' because it is used only at pageinspect document. I wonder if > the word 'item identifier' would rather be appropriate here because > this is used at at describing page layout(in storage.sgml). Thought? That seems reasonable to me. It seems like ItemIdData is referred to as "item pointer," "line pointer," or "item identifier" depending on where you're looking, but ItemPointerData is also referred to as "item pointer." I think using "item identifier" is appropriate here for clarity and consistency with storage.sgml. >> tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, >> false, >> - >> >latestRemovedXid); >> + >> >latestRemovedXid, >> + >> _pruned); >> >> Why do we need a separate tups_pruned argument in heap_page_prune()? >> Could we add the result of heap_page_prune() to tups_pruned instead, >> then report the total number of removed tuples as tups_vacuumed + >> tups_pruned elsewhere? > > Hmm, I thought that we should report only the number of tuples > completely removed but we already count the tulples marked as > redirected as tups_vacuumed. Let me summarize the fate of dead tuples. > I think we can roughly classify dead tuples as follows. > > 1. root tuple of HOT chain that became dead > 2. root tuple of HOT chain that became redirected > 3. other tupels of HOT chain that became unused > 4. tuples that became dead after HOT pruning > > The tuples of #1 through #3 either have only ItemIDs or have been > completely removed but tuples of #4 has its tuple storage because they > are not processed when HOT-pruning. > > Currently tups_vacuumed counts all of them, nleft (= > vacrelstats->num_dead_tuples) counts #1 + #4. I think that the number > of removed tuples being reported would be #1 + #2 + #3. Or should we > use #2 + #3 instead? I think I'm actually more in favor of what was in v6. IIRC that version of the patch didn't modify how we tracked the "removed" tuples at all, but it just added the "X item identifiers left marked dead" metric. Since even the tuples we are leaving marked dead lose storage, that seems accurate enough to me. >> Another interesting thing I noticed is that this "removed X row >> versions" message is only emitted if vacuumed_pages is greater than 0. >> However, if we only did HOT pruning, tups_vacuumed will be greater >> than 0 while vacuumed_pages will still be 0, so some information will >> be left out. I think this is already the case, though, so this could >> probably be handled in a separate thread. >> > > Hmm, since this log message is corresponding to the one that > lazy_vacuum_heap makes and total number of removed tuples are always > reported, it seems consistent to me. Do you have another point? Here's an example: postgres=# CREATE TABLE test (a INT, b INT); CREATE TABLE postgres=# CREATE INDEX ON test (a); CREATE INDEX postgres=# INSERT INTO test VALUES (1, 2); INSERT 0 1 After only HOT updates, the "removed X row versions in Y pages" message is not emitted: postgres=# UPDATE test SET b = 3; UPDATE 1 postgres=#
Windows 32 bit vs circle test
We don't currently have any buildfarm animals running 32 bit mingw builds for releases > 10. As part of my testing of msys 2 I thought I would try its 32 bit compiler and got this regression diff on HEAD cheers andrew diff -w -U3 C:/tools/msys64/home/Administrator/bf/root/HEAD/pgsql/src/test/regress/expected/circle.out C:/tools/msys64/home/Administrator/bf/root/HEAD/pgsql.build/src/test/regress/results/circle.out --- C:/tools/msys64/home/Administrator/bf/root/HEAD/pgsql/src/test/regress/expected/circle.out 2019-03-03 17:14:38.648207000 + +++ C:/tools/msys64/home/Administrator/bf/root/HEAD/pgsql.build/src/test/regress/results/circle.out 2019-03-04 19:33:37.576494900 + @@ -111,8 +111,8 @@ WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0) ORDER BY distance, area(c1.f1), area(c2.f1); five | one | two | distance ---+++-- - | <(3,5),0> | <(1,2),3> | 0.60555127546399 +--+++--- + | <(3,5),0> | <(1,2),3> | 0.605551275463989 | <(3,5),0> | <(5,1),3> | 1.47213595499958 | <(100,200),10> | <(100,1),115> | 74 | <(100,200),10> | <(1,2),100> | 111.370729772479 -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Inheriting table AMs for partitioned tables
On Tue, Mar 5, 2019 at 12:59 PM Andres Freund wrote: > Based on this mail I'm currently planning to simply forbid specifying > USING for partitioned tables. Then we can argue about this later. +1. I actually think that might be the right thing in the long-term, but it undeniably avoids committing to any particular decision in the short term, which seems good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Ordered Partitioned Table Scans
On Wed, Dec 19, 2018 at 5:08 PM David Rowley wrote: > With my idea for using live_parts, we'll process the partitions > looking for interleaved values on each query, after pruning takes > place. In this case, we'll see the partitions are naturally ordered. I > don't really foresee any issues with that additional processing since > it will only be a big effort when there are a large number of > partitions, and in those cases the planner already has lots of work to > do. Such processing is just a drop in the ocean when compared to path > generation for all those partitions. I agree that partitions_are_ordered() is cheap enough in this patch that it probably doesn't matter whether we cache the result. On the other hand, that's mostly because you haven't handled the hard cases - e.g. interleaved list partitions. If you did, then it would be expensive, and it probably *would* be worth caching the result. Now maybe those hard cases aren't worth handling anyway. You also seem to be saying that since we run-time partitioning pruning might change the answer, caching the initial answer is pointless. But I think Julien has made a good argument for why that's wrong: if the initial answer is that the partitions are ordered, which will often be true, then we can skip all later checks. So I am OK with the fact that this patch doesn't choose to cache it, but I don't really buy any of your arguments for why it would be a bad idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Rare SSL failures on eelpout
On Wed, Mar 6, 2019 at 6:07 AM Tom Lane wrote: > Thomas Munro writes: > > You can see that poll() already knew the other end had closed the > > socket. Since this is clearly timing... let's see, yeah, I can make > > it fail every time by adding sleep(1) before the comment "Send the > > startup packet.". I assume that'll work on any Linux machine? > > Great idea, but no cigar --- doesn't do anything for me except make > the ssl test really slow. (I tried it on RHEL6 and Fedora 28 and, just > for luck, current macOS.) What this seems to prove is that the thing > that's different about eelpout is the particular kernel it's running, > and that that kernel has some weird timing behavior in this situation. > > I've also been experimenting with reducing libpq's SO_SNDBUF setting > on the socket, with more or less the same idea of making the sending > of the startup packet slower. No joy there either. > > Annoying. I'd be happier about writing code to fix this if I could > reproduce it :-( Hmm. Note that eelpout only started doing it with OpenSSL 1.1.1. But I just tried the sleep(1) trick on an x86 box running the same version of Debian, OpenSSL etc and it didn't work. So eelpout (a super cheap virtualised 4-core ARMv8 system rented from scaleway.com running Debian Buster with a kernel identifying itself as 4.9.23-std-1 and libc6 2.28-7) is indeed starting to look pretty weird. Let me know if you want to log in and experiment on that machine. -- Thomas Munro https://enterprisedb.com
Re: Inheriting table AMs for partitioned tables
On 2019-03-04 22:08:04 -0800, Andres Freund wrote: > Hi, > > On 2019-03-05 16:01:50 +1300, David Rowley wrote: > > On Tue, 5 Mar 2019 at 12:47, Andres Freund wrote: > > > CREATE TABLE tableam_parted_heap2 (a text, b int) PARTITION BY list (a) > > > USING heap2; > > > > > > SET default_table_access_method = 'heap'; > > > CREATE TABLE tableam_parted_a_heap2 PARTITION OF tableam_parted_heap2 FOR > > > VALUES IN ('a'); > > > > > > > But for tableam_parted_a_heap2 tableam_parted_b_heap2 the answer isn't > > > quite as clear. I think it'd both be sensible for new partitions to > > > inherit the AM from the root, but it'd also be sensible to use the > > > current default. > > > > I'd suggest it's made to work the same way as ca4103025dfe26 made > > tablespaces work. > > Hm, is that actually correct? Because as far as I can tell that doesn't > have the necessary pg_dump code to make this behaviour persistent: > > CREATE TABLESPACE frak LOCATION '/tmp/frak'; > CREATE TABLE test_tablespace (a text, b int) PARTITION BY list (a) TABLESPACE > frak ; > CREATE TABLE test_tablespace_1 PARTITION OF test_tablespace FOR VALUES in > ('a'); > CREATE TABLE test_tablespace_2 PARTITION OF test_tablespace FOR VALUES in > ('b') TABLESPACE pg_default; > CREATE TABLE test_tablespace_3 PARTITION OF test_tablespace FOR VALUES in > ('c') TABLESPACE frak; > > SELECT relname, relkind, reltablespace FROM pg_class WHERE relname LIKE > 'test_tablespace%' ORDER BY 1; > ┌───┬─┬───┐ > │ relname │ relkind │ reltablespace │ > ├───┼─┼───┤ > │ test_tablespace │ p │ 16384 │ > │ test_tablespace_1 │ r │ 16384 │ > │ test_tablespace_2 │ r │ 0 │ > │ test_tablespace_3 │ r │ 16384 │ > └───┴─┴───┘ > (4 rows) > > but a dump outputs (abbreviated) > > SET default_tablespace = frak; > CREATE TABLE public.test_tablespace ( > a text, > b integer > ) > PARTITION BY LIST (a); > CREATE TABLE public.test_tablespace_1 PARTITION OF public.test_tablespace > FOR VALUES IN ('a'); > SET default_tablespace = ''; > CREATE TABLE public.test_tablespace_2 PARTITION OF public.test_tablespace > FOR VALUES IN ('b'); > SET default_tablespace = frak; > CREATE TABLE public.test_tablespace_3 PARTITION OF public.test_tablespace > FOR VALUES IN ('c'); > > which restores to: > > postgres[32125][1]=# SELECT relname, relkind, reltablespace FROM pg_class > WHERE relname LIKE 'test_tablespace%' ORDER BY 1; > ┌───┬─┬───┐ > │ relname │ relkind │ reltablespace │ > ├───┼─┼───┤ > │ test_tablespace │ p │ 16384 │ > │ test_tablespace_1 │ r │ 16384 │ > │ test_tablespace_2 │ r │ 16384 │ > │ test_tablespace_3 │ r │ 16384 │ > └───┴─┴───┘ > (4 rows) > > because public.test_tablespace_2 assumes it's ought to inherit the > tablespace from the partitioned table. > > > I also find it far from clear that: > > > The tablespace_name is the > name > of the tablespace in which the new table is to be created. > If not specified, >is consulted, or >if the table is temporary. For > partitioned tables, since no storage is required for the table itself, > the tablespace specified here only serves to mark the default tablespace > for any newly created partitions when no other tablespace is explicitly > specified. > > > is handled correctly. The above says that the *specified* tablespaces - > which seems to exclude the default tablespace - is what's going to > determine what partitions use as their default tablespace. But in fact > that's not true, the partitioned table's pg_class.retablespace is set to > what default_tablespaces was at the time of the creation. > > > > i.e. if they specify the storage type when creating > > the partition, then always use that, unless they mention otherwise. If > > nothing was mentioned when they created the partition, then use > > default_table_access_method. > > Hm. That'd be doable, but given the above ambiguities I'm not convinced > that's the best approach. As far as I can see that'd require: > > 1) At relation creation, for partitioned tables only, do not take >default_table_access_method into account. > > 2) At partition creation, if the AM is not specified and if the >partitioned table's relam is 0, use the default_table_access_method. > > 3) At pg_dump, for partitioned tables only, explicitly emit a USING >... rather than use the method of manipulating default_table_access_method. > > As far as I can tell, the necessary steps are also what'd need to be > done to actually implement the described behaviour for TABLESPACE (with > s/default_table_access_method/default_tablespace/ and
Re: insensitive collations
Peter Eisentraut wrote: > Older ICU versions (<54) don't support all the locale customization > options, so many of my new tests in collate.icu.utf8.sql will fail on > older systems. What should we do about that? Have another extra test file? Maybe stick to the old-style syntax for the regression tests? The declarations that won't work as expected with older ICU versions would be: CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); 'und-u-ks-level2' is equivalent to 'und@colStrength=secondary' CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false); 'und-u-ks-level1-kc-true' => 'und@colStrength=primary;colCaseLevel=yes' Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: proposal: new polymorphic types - commontype and commontypearray
út 5. 3. 2019 v 15:35 odesílatel Tom Lane napsal: > David Steele writes: > > This thread has been very quiet for a month. I agree with Andres [1] > > that we should push this to PG13. > > I think the main thing it's blocked on is disagreement on what the > type name should be, which is kind of a silly thing to get blocked on, > but nonetheless it's important ... > I sent some others possible names, but probably this mail was forgotten What about "ctype" like shortcut for common type? carraytype, cnonarraytype? Regards Pavel > > regards, tom lane >
Re: Re: Optimze usage of immutable functions as relation
On 2/28/19 4:27 PM, Alexander Kuzmenkov wrote: On 2/18/19 03:20, Tom Lane wrote: The dummy-relation stuff I referred to has now been merged, so there's really no good reason not to revise the patch along that line. I'll try to post the revised implementation soon. I'll close this on March 8th if there is no new patch. Regards, -- -David da...@pgmasters.net
Re: Re: \describe*
> > > I agree with Andres and Robert. This patch should be pushed to PG13. > > I'll do that on March 8 unless there is a compelling argument not to. > > No objection. I'll continue to work on it, though.
Re: Refactoring the checkpointer's fsync request queue
Thomas Munro writes: > +#include "fmgr.h" > +#include "storage/block.h" > +#include "storage/relfilenode.h" > +#include "storage/smgr.h" > +#include "storage/sync.h" > Why do we need to include fmgr.h in md.h? More generally, any massive increase in an include file's inclusions is probably a sign that you need to refactor. Cross-header inclusions are best avoided altogether if you can --- obviously that's not always possible, but we should minimize them. We've had some very unfortunate problems in the past from indiscriminate #includes in headers. regards, tom lane
Re: jsonpath
On Mon, Mar 4, 2019 at 6:27 PM Tomas Vondra wrote: > 11) Wording of some of the error messages in the execute methods seems a > bit odd. For example executeNumericItemMethod may complain that it > > ... is applied to not a numeric value > > but perhaps a more natural wording would be > > ... is applied to a non-numeric value > > And similarly for the other execute methods. But I'm not a native > speaker, so perhaps the original wording is just fine. As a native speaker I can confirm that the first wording is definitely not OK. The second one is tolerable, but I wonder if there is something better, like "can only be applied to a numeric value" or maybe there's a way to rephrase it so that we complain about the non-numeric value itself rather than the application, e.g. ERROR: json_frobnitz can only frob numeric values or ERROR: argument to json_frobnitz must be numeric. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Refactoring the checkpointer's fsync request queue
On Wed, Mar 6, 2019 at 5:07 AM Shawn Debnath wrote: > Confirmed. Patch shows 8900 ms vs 192 ms on master for the insert test. > Interesting! It's reproducible so should be able to figure out what's > going on. The only thing we do in ForwardSyncRequest() is split up the 8 > bits into 2x4 bits and copy the FileTagData structure to the > checkpointer queue. Will report back what I find. More review, all superficial stuff: +typedef struct +{ + RelFileNode rnode; + ForkNumber forknum; + SegmentNumber segno; +} FileTagData; + +typedef FileTagData *FileTag; Even though I know I said we should take FileTag by pointer, and even though there is an older tradition in the tree of having a struct named "FooData" and a corresponding pointer typedef named "Foo", as far as I know most people are not following the convention for new code and I for one don't like it. One problem is that there isn't a way to make a pointer-to-const type given a pointer-to-non-const type, so you finish up throwing away const from your programs. I like const as documentation and a tiny bit of extra compiler checking. What do you think about "FileTag" for the struct and eg "const FileTag *tag" when receiving one as a function argument? -/* internals: move me elsewhere -- ay 7/94 */ Aha, about time too! +#include "fmgr.h" +#include "storage/block.h" +#include "storage/relfilenode.h" +#include "storage/smgr.h" +#include "storage/sync.h" Why do we need to include fmgr.h in md.h? +/* md storage manager funcationality */ Typo. +/* md sync callback forward declarations */ These aren't "forward" declarations, they're plain old declarations. +extern char* mdfilepath(FileTag ftag); Doesn't really matter too much because all of this will get pgindent-ed at some point, but FYI we write "char *md", not "char* md". #include "storage/smgr.h" +#include "storage/md.h" #include "utils/hsearch.h" Bad sorting. + FileTagData tag; + tag.rnode = reln->smgr_rnode.node; + tag.forknum = forknum; + tag.segno = seg->mdfd_segno; I wonder if it would be better practice to zero-initialise that sucker, so that if more members are added we don't leave them uninitialised. I like the syntax "FileTagData tag = {{0}}". (Unfortunately extra nesting required here because first member is a struct, and C99 doesn't allow us to use empty {} like C++, even though some versions of GCC accept it. Rats.) -- Thomas Munro https://enterprisedb.com
Re: Rare SSL failures on eelpout
On Wed, Mar 6, 2019 at 3:33 AM Tom Lane wrote: > Thomas Munro writes: > > Disappointingly, that turned out to be just because 10 and earlier > > didn't care what the error message said. > > That is, you can reproduce the failure on old branches? That lets > out a half-theory I'd had, which was that Andres' changes to make > the backend always run its socket in nonblock mode had had something > to do with it. (Those changes do represent a plausible reason why > SSL_shutdown might be returning WANT_READ/WANT_WRITE; but I'm not > in a hurry to add such code without evidence that it actually > happens and something useful would change if we retry.) Yes, on REL_10_STABLE: $ for i in `seq 1 1000 ` ; do psql "host=localhost port=56024 dbname=certdb user=postgres sslcert=ssl/client-revoked.crt sslkey=ssl/client-revoked.key" done psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked ... psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. could not send startup packet: Connection reset by peer psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked psql: SSL error: sslv3 alert certificate revoked Zooming in with strace: sendto(3, "\27\3\3\2\356\r\214\352@\21\320\202\236}\376\367\262\227\177\255\212\204`q\254\108\326\201+c)"..., 1115, MSG_NOSIGNAL, NULL, 0) = 1115 ppoll([{fd=3, events=POLLOUT|POLLERR}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLOUT|POLLERR|POLLHUP}]) sendto(3, "\27\3\3\0cW_\210\337Q\227\360\216k\221\346\372pw\27\325P\203\357\245km\304Rx\355\200"..., 104, MSG_NOSIGNAL, NULL, 0) = -1 ECONNRESET (Connection reset by peer) You can see that poll() already knew the other end had closed the socket. Since this is clearly timing... let's see, yeah, I can make it fail every time by adding sleep(1) before the comment "Send the startup packet.". I assume that'll work on any Linux machine? To set this test up, I ran a server with the following config: ssl=on ssl_ca_file='root+client_ca.crt' ssl_cert_file='server-cn-only.crt' ssl_key_file='server-cn-only.key' ssl_crl_file='root+client.crl' I copied those files out of src/test/ssl/ssl/. Then I ran the psql command shown earlier. I think I had to chmod 600 the keys. -- Thomas Munro https://enterprisedb.com
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada wrote: > >> === Discussion points === > >> > >>- Progress counter for "3. sorting tuples" phase > >> - Should we add pgstat_progress_update_param() in tuplesort.c like a > >> "trace_sort"? > >> Thanks to Peter Geoghegan for the useful advice! > > > > How would we avoid an abstraction violation? > > Hmm... What do you mean an abstraction violation? > If it is difficult to solve, I'd not like to add the progress counter for the > sorting tuples. What I mean is... I think it would be useful to have this counter, but I'm not sure how the tuplesort code would know to update the counter in this case and not in other cases. The tuplesort code is used for lots of things; we can't update a counter for CLUSTER if the tuplesort is being used for CREATE INDEX or a Sort node in a query or whatever. So my question is how we would indicate to the tuplesort that it needs to do the counter update, and whether that would end up making for ugly code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Question about commit 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5
On Tue, Mar 5, 2019 at 3:00 AM Etsuro Fujita wrote: > apply_projection_to_path() not only jams the given tlist into the > existing path but updates its tlist eval costs appropriately except for > the cases of Gather and GatherMerge: I had forgotten that detail, but I don't think it changes the basic picture. Once you've added a bunch of Paths to a RelOptInfo, it's too late to change their *relative* cost, because add_path() puts the list in a certain order, and adjusting the path costs won't change that ordering. You've got to have the costs already correct at the time add_path() is first called for any given Path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: speeding up planning with partitions
On 3/5/19 5:24 AM, Amit Langote wrote: Attached an updated version. This incorporates fixes for both Jesper's and Imai-san's review. I haven't been able to pin down the bug (or whatever) that makes throughput go down as the partition count increases, when tested with a --enable-cassert build. Thanks ! I'm seeing the throughput going down as well, but are you sure it isn't just the extra calls of MemoryContextCheck you are seeing ? A flamegraph diff highlights that area -- sent offline. A non cassert build shows the same profile for 64 and 1024 partitions. Best regards, Jesper
Re: Delay locking partitions during query execution
On 3/5/19 6:55 AM, David Rowley wrote: > On Sat, 2 Feb 2019 at 02:52, Robert Haas wrote: >> I think the key question here is whether or not you can cope with >> someone having done arbitrary AEL-requiring modifications to the >> delaylocked partitions. If you can, the fact that the plan might >> sometimes be out-of-date is an inevitable consequence of doing this at >> all, but I think we can argue that it's an acceptable consequence as >> long as the resulting behavior is not too bletcherous. If you can't, >> then this patch is dead. > > I spent some time looking at this patch again and thinking about the > locking. I ended up looking through each alter table subcommand by way > of AlterTableGetLockLevel() and going through scenarios with each of > them in my head to try to understand: > > a) If the subcommand can even be applied to a leaf partition; and > b) Can the subcommand cause a cached plan to become invalid? > > I did all the easy ones first then started on the harder ones. I came > to AT_DropConstraint and imagined the following scenario: > > -- ** session 1 > create table listp (a int) partition by list(a); > create table listp1 partition of listp for values in(1); > create table listp2 partition of listp for values in(2); > > create index listp1_a_idx on listp1 (a); > create index listp2_a_idx on listp2 (a); > > set enable_seqscan = off; > set plan_cache_mode = force_generic_plan; > prepare q1 (int) as select * from listp where a = $1; > execute q1 (1); > begin; > execute q1 (1); > > > -- ** session 2 > drop index listp2_a_idx; > > -- ** session 1 > execute q1 (2); > ERROR: could not open relation with OID 16401 > > The only way I can think to fix this is to just never lock partitions > at all, and if a lock is to be obtained on a partition, it must be > instead obtained on the top-level partitioned table. That's a rather > large change that could have large consequences, and I'm not even sure > it's possible since we'd need to find the top-level parent before > obtaining the lock, by then the hierarchy might have changed and we'd > need to recheck, which seems like quite a lot of effort just to obtain > a lock... Apart from that, it's not this patch, so looks like I'll > need to withdraw this one :-( > So you're saying we could 1) lookup the parent and lock it 2) repeat the lookup to verify it did not change I think that could still be a win, assuming that most hierarchies will be rather shallow (I'd say 2-3 levels will cover like 95% of cases, and 4 levels would be 100% in practice). And the second lookup should be fairly cheap thanks to syscache and the fact that the hierarchies do not change very often. I can't judge how invasive this patch would be, but I agree it's more complex than the originally proposed patch. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
any plan to support shared servers like Oracle in PG?
currently there is one process per connection and it will not not very good for some short time connection.In oracle database, it support shared server which can serve more than 1 users at the same time. See https://docs.oracle.com/cd/B28359_01/server.111/b28310/manproc001.htm#ADMIN11166 do we have any plan about this?
Re: Should we increase the default vacuum_cost_limit?
On Tue, Mar 5, 2019 at 7:53 AM Tomas Vondra wrote: > But on the other hand it feels a bit weird that we increase this one > value and leave all the other (also very conservative) defaults alone. Are you talking about vacuum-related defaults or defaults in general? In 2014, we increased the defaults for work_mem and maintenance_work_mem by 4x and the default for effective_cache_size by 32x; in 2012, we increased the default for shared_buffers from by 4x. It's possible some of those parameters should be further increased at some point, but deciding not to increase any of them until we can increase all of them is tantamount to giving up on changing anything at all. I think it's OK to be conservative by default, but we should increase parameters where we know that the default is likely to be too conservative for 99% of users. My only question about this change is whether to go for a lesser multiple (e.g. 4x) rather than the proposed 10x. But I think even if 10x turns out to be too much for a few more people than we'd like, we're still going to be better off increasing it and having some people have to turn it back down again than leaving it the way it is and having users regularly suffer vacuum-starvation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Mon, Mar 4, 2019 at 1:01 AM Masahiko Sawada wrote: > I think that there is no need to use the same key for both the spill > files and WAL because only one process encrypt/decrypt spill files. We > can use something like temporary key for that use case, which is used > by only one process and lives during process lifetime (or transaction > lifetime). The same is true for for other temporary files such as > tuplesort and tuplestore, although maybe we need tricks for shared > tuplestore. Agreed. For a shared tuplestore you need a key that is shared between the processes involved, but it doesn't need to be the same as any other key. For anything that is accessed by only a single process, that process can just generate any old key and, as long as it's secure, it's fine. For the WAL, you could potentially create a new WAL record type that is basically an encrypted wrapper around another WAL record. So if table X is encrypted with key K1, then all of the WAL records for table X are wrapped inside of an encrypted-record WAL record that is encrypted with key K1. That's useful for people who want fine-grained encryption only of certain data. But for people who want to just encrypt everything, you need to encrypt the entire WAL stream, all SLRU data, etc. and that pretty much all has to be one key (or sub-keys derived from that one key somehow). > > Or what do you do > > about SLRUs or other global structures? If you just exclude that > > stuff from the scope of encryption, then you aren't helping the people > > who want to Just Encrypt Everything. > > Why do people want to just encrypt everything? For satisfying some > security compliance? Yeah, I think so. Perhaps an encrypted filesystem is a better way to go, but some people want something that is built into the database server. The motivation seems to be mostly that they have a compliance requirement -- either the database itself encrypts everything, or they cannot use the software. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Index Skip Scan
> On Thu, Feb 28, 2019 at 10:45 PM Jeff Janes wrote: > > This version of the patch can return the wrong answer. Yes, indeed. In fact it answers my previous question related to the backward cursor scan, when while going back we didn't skip enough. Within the current approach it can be fixed by proper skipping for backward scan, something like in the attached patch. Although there are still some rough edges, e.g. going forth, back and forth again leads to a sutiation, when `_bt_first` is not applied anymore and the first element is wrongly skipped. I'll try to fix it with the next version of patch. > If we accept this patch, I hope it would be expanded in the future to give > similar performance as the above query does even when the query is written in > its more natural way of: Yeah, I hope the current approach with a new index am routine can be extended for that. > Without the patch, instead of getting a wrong answer, I get an error: Right, as far as I can see without a skip scan and SCROLL, a unique + index scan is used, where amcanbackward is false by default. So looks like it's not really patch related. v10-0001-Index-skip-scan.patch Description: Binary data
Re: Fsync-before-close thought experiment
On Sun, Mar 3, 2019 at 6:31 PM Thomas Munro wrote: > A more obvious approach that probably moves us closer to the way > kernel developers expect us to write programs is to call fsync() > before close() (due to vfd pressure) if you've written. Interesting > The obvious > problem with that is that you could finish up doing loads more > fsyncing than we're doing today if you're regularly dirtying more than > max_files_per_process in the same backend. Check. > I wonder if we could make > it more acceptable like so: > > * when performing writes, record the checkpointer's cycle counter in the File > > * when closing due to vfd pressure, only call fsync() if the cycle > hasn't advanced (that is, you get to skip the fsync() if you haven't > written in this sync cycle, because the checkpointer must have taken > care of your writes from the previous cycle) Hmm, OK. > * if you find you're doing this too much (by default, dirtying more > than 1k relation segments per checkpoint cycle), maybe log a warning > that the user might want to consider increasing max_files_per_process > (and related OS limits) > > A possible improvement, stolen from the fd-passing patch, is to > introduce a "sync cycle", separate from checkpoint cycle, so that the > checkpointer can process its table of pending operations more than > once per checkpoint if that would help minimise fsyncing in foreground > processes. I haven't thought much about how exactly that would work. Yeah, that seems worth considering. I suppose that a backend could keep track of how many times it's recorded the current sync cycle in a File that is still open -- this seems like it should be pretty simple and cheap, provided we can find the right places to put the counter adjustments. If that number gets too big, like say greater than 80% of the number of fds, it sends a ping to the checkpointer. I'm not sure if that would then immediately trigger a full sync cycle or if there is something more granular we could do. > There is a less obvious problem with this scheme though: > > 1. Suppose the operating system has a single error flag for a file no > matter how many fds have it open, and it is cleared by the first > syscall to return EIO. There is a broken schedule like this (B = > regular backend, C = checkpointer): > > B: fsync() -> EIO # clears error flag > C: fsync() -> success, log checkpoint > B: PANIC! > > 2. On an operating system that has an error counter + seen flag > (Linux 4.14+), in general you receive the error in all fds, which is a > very nice property, but we'd still have a broken schedule involving > the seen flag: > > B: fsync() -> EIO # clears seen flag > C: open() # opened after error > C: fsync() -> success, log checkpoint > B: PANIC! > > Here's one kind of interlocking that might work: Hash pathnames and > map to an array of lwlocks + error flags. Any process trying to sync > a file must hold the lock and check for a pre-existing error flag. > Now a checkpoint cannot succeed if any backend has recently decided to > panic. You could skip that if data_sync_retry = on. That might be fine, but I think it might be possible to create something more light-weight. Suppose we just decide that foreground processes always win and the checkpointer has to wait before logging the checkpoint. To that end, a foreground process advertises in shared memory whether or not it is currently performing an fsync. The checkpointer must observe each process until it sees that process in the not-fsyncing state at least once. If a process starts fsync-ing after being observed not-fsyncing, it just means that the backend started doing an fsync() after the checkpointer had completed the fsyncs for that checkpoint. And in that case the checkpointer would have observed the EIO for any writes prior to the checkpoint, so it's OK to write that checkpoint; it's only the next one that has an issue, and the fact that we're now advertising that we are fsync()-ing again will prevent that one from completing before we emit any necessary PANIC. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company