Re: pg_partition_tree crashes for a non-defined relation
On Thu, Feb 28, 2019 at 11:50:16PM -0500, Tom Lane wrote: > FWIW, I don't agree with Michael's suggestion above. A plain table is > significantly different from a partitioned table with no children: > you can store rows in the former but not the latter, and you can add > partitions to the latter but not the former. So conflating the two > doesn't seem likely to lead to any good outcome. Okay, so the opinion moves into this direction. What would be needed is just something like the attached patch then based on Amit's suggestion? The routine names and comments looked fine to me so I have not touched the surroundings, and comments are welcome. > But, having said that, we've learned that it's generally bad for > catalog-query functions to fail outright just because they're pointed > at the wrong kind of catalog object. So I think that what we want here > is for pg_partition_tree to return NULL or an empty set or some such > for a plain table, while its output for a childless partitioned table > should be visibly different from that. I'm not wedded to details > beyond that idea. Yep, that's the intention since cc53123. I won't come back to return an ERROR in any case. Here is what the patch gives for childless partitions FWIW: =# CREATE TABLE ptif_test (a int, b int) PARTITION BY range (a); CREATE TABLE =# SELECT * FROM pg_partition_tree('ptif_test'); relid | parentrelid | isleaf | level ---+-++--- ptif_test | null| f | 0 (1 row) -- Michael diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c index 36d9f69cbc..a2fe4f34b6 100644 --- a/src/backend/utils/adt/partitionfuncs.c +++ b/src/backend/utils/adt/partitionfuncs.c @@ -35,17 +35,17 @@ static bool check_rel_can_be_partition(Oid relid) { char relkind; + bool relispartition; /* Check if relation exists */ if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid))) return false; relkind = get_rel_relkind(relid); + relispartition = get_rel_relispartition(relid); /* Only allow relation types that can appear in partition trees. */ - if (relkind != RELKIND_RELATION && - relkind != RELKIND_FOREIGN_TABLE && - relkind != RELKIND_INDEX && + if (!relispartition && relkind != RELKIND_PARTITIONED_TABLE && relkind != RELKIND_PARTITIONED_INDEX) return false; @@ -189,13 +189,6 @@ pg_partition_root(PG_FUNCTION_ARGS) if (!check_rel_can_be_partition(relid)) PG_RETURN_NULL(); - /* - * If the relation is not a partition (it may be the partition parent), - * return itself as a result. - */ - if (!get_rel_relispartition(relid)) - PG_RETURN_OID(relid); - /* Fetch the top-most parent */ ancestors = get_partition_ancestors(relid); rootrelid = llast_oid(ancestors); diff --git a/src/test/regress/expected/partition_info.out b/src/test/regress/expected/partition_info.out index 73269ffd09..24e3fe60f3 100644 --- a/src/test/regress/expected/partition_info.out +++ b/src/test/regress/expected/partition_info.out @@ -138,19 +138,18 @@ SELECT relid, parentrelid, level, isleaf (6 rows) DROP TABLE ptif_test; --- Table that is not part of any partition tree is the only member listed. +-- Table that is not part of any partition tree is not listed. CREATE TABLE ptif_normal_table(a int); SELECT relid, parentrelid, level, isleaf FROM pg_partition_tree('ptif_normal_table'); - relid | parentrelid | level | isleaf +-+---+ - ptif_normal_table | | 0 | t -(1 row) + relid | parentrelid | level | isleaf +---+-+---+ +(0 rows) SELECT pg_partition_root('ptif_normal_table'); pg_partition_root --- - ptif_normal_table + (1 row) DROP TABLE ptif_normal_table; diff --git a/src/test/regress/sql/partition_info.sql b/src/test/regress/sql/partition_info.sql index 119b90afe4..d9dfa5d5d7 100644 --- a/src/test/regress/sql/partition_info.sql +++ b/src/test/regress/sql/partition_info.sql @@ -64,7 +64,7 @@ SELECT relid, parentrelid, level, isleaf DROP TABLE ptif_test; --- Table that is not part of any partition tree is the only member listed. +-- Table that is not part of any partition tree is not listed. CREATE TABLE ptif_normal_table(a int); SELECT relid, parentrelid, level, isleaf FROM pg_partition_tree('ptif_normal_table'); signature.asc Description: PGP signature
Re: readdir is incorrectly implemented at Windows
On Fri, Mar 01, 2019 at 10:23:02AM +0300, Konstantin Knizhnik wrote: > Yes, Yuri Kurenkov and Grigory Smalking did a lot in investigation > of this problem. > (the irony is that the problem detected by Yuri was caused by > another bug in pg_probackup, but we thought that it was related with > permissions and come to this issue). Thanks for confirming, Konstantin. Let's wait a couple of days to see if anybody has objections or comments, and I'll try to commit this patch. -- Michael signature.asc Description: PGP signature
Re: readdir is incorrectly implemented at Windows
On 01.03.2019 9:13, Michael Paquier wrote: On Thu, Feb 28, 2019 at 11:15:53AM +0900, Michael Paquier wrote: Could you add it to the next commit fest as a bug fix please? I think that I will be able to look at that in details soon, but if not it would be better to not lose track of your fix. Okay, I have looked at your patch. And double-checked on Windows. To put it short, I agree with the approach you are taking. I have been curious about the mention to MinGW in the existing code as well as in the patch you are proposing, and I have checked if what your patch and what the current state are correct, and I think that HEAD is wrong on one thing. First mingw64 code can be found here: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/ https://github.com/Alexpux/mingw-w64/ Then, the implementation of readdir/opendir/closedir can be found in mingw-w64-crt/misc/dirent.c. Looking at their implementation of readdir, I can see two things: 1) When beginning a search in a directory, _tfindfirst gets used, which returns ENOENT as error if no matches are found: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/findfirst-functions?view=vs-2017 So from this point of view your patch is right: you make readdir() return errno=0 which matches the normal *nix behaviors. And MinGW does not do that. 2) On follow-up lookups, MinGW code uses _tfindnext, and actually *enforces* errno=0 when seeing ERROR_NO_MORE_FILES. So from this point of view the code's comment in HEAD is incorrect as of today. The current implementation exists since 399a36a so perhaps MinGW was not like that when dirent.c has been added in src/port/, but that's not true today. So let's fix the comment at the same time. Attached is an updated patch with my suggestions. Does it look fine to you? Yes, certainly. Also, I think that we should credit Yuri Kurenkov for the discovery of the issue, with yourself, Konstantin, as the author of the patch. Are there other people involved which should be credited? Like Grigory? Yes, Yuri Kurenkov and Grigory Smalking did a lot in investigation of this problem. (the irony is that the problem detected by Yuri was caused by another bug in pg_probackup, but we thought that it was related with permissions and come to this issue). -- Michael -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: get_controlfile() can leak fds in the backend
Hello Andres, I think putting this into the control file is a seriously bad idea. Postmaster interlocks against other postmasters running via postmaster.pid. Having a second interlock mechanism, in a different file, doesn't make any sort of sense. Nor does it seem sane to have external tool write over data as INTENSELY critical as the control file, when they then have to understand CRCs etc. On this point, there are functions for that, get/update_controlfile, so this should be factored out. The initial insentive for raising the issue, probably in the wrong thread and without a clear understanding of the matter, is that I've been reviewing a patch to enable/disable checksums on a stopped cluster. The patch updates all the checksums in all the files, and changes the control file to tell that there are checksums. Currently it checks and creates a fake "posmaster.pid" file to attempt to prevent another tool to run concurrently to this operation, with ISTM a procedure prone to race conditions thus does not warrant that it would be the only tool running on the cluster. This looked to me as a bad hack. Given that other command that take on a cluster seemed to use the controlfile to signal that they are doing something, I'd thought that it would be the way to go, but then I noticed that the control file read/write procedure looks as bad as the postmaster.pid hack to ensure that only one command is active. Nevertheless, I'm ranting in the wrong thread and it seems that these is no real problem to solve, so I'll say fine to the to-me-unconvincing "postmaster.pid" hack proposed in the patch. -- Fabien.
Re: 2019-03 Starts Tomorrow
On 3/1/19 8:19 AM, Michael Paquier wrote: On Thu, Feb 28, 2019 at 10:47:06PM -0500, Tom Lane wrote: Michael Paquier writes: So do we have anybody willing to take the glorious position of CFM for this commit fest? IIRC, Steele already said he'd do it. Okay, fine for me of course if that's the case! For what it's worth, I did not understand that he wanted to be CFM, I just understood that this email is a reminder that the CF will begin... These are quite separate things. Sorry for being confused. Not at all. I volunteered on the thread closing out the last CF so it wasn't that obvious. So, yes, I am the CFM. -- -David da...@pgmasters.net
Re: 2019-03 Starts Tomorrow
On Thu, Feb 28, 2019 at 10:47:06PM -0500, Tom Lane wrote: > Michael Paquier writes: >> So do we have anybody willing to take the glorious position of CFM for >> this commit fest? > > IIRC, Steele already said he'd do it. Okay, fine for me of course if that's the case! For what it's worth, I did not understand that he wanted to be CFM, I just understood that this email is a reminder that the CF will begin... These are quite separate things. Sorry for being confused. -- Michael signature.asc Description: PGP signature
Re: Prevent extension creation in temporary schemas
On Thu, Feb 28, 2019 at 10:52:52PM -0500, Tom Lane wrote: > If you're suggesting that we disable that security restriction > during extension creation, I really can't see how that'd be a > good thing ... No, I don't mean that. I was just wondering if someone can set search_path within the SQL script which includes the extension contents to bypass the restriction and the error. They can always prefix such objects with pg_temp anyway if need be... -- Michael signature.asc Description: PGP signature
Re: readdir is incorrectly implemented at Windows
On Thu, Feb 28, 2019 at 11:15:53AM +0900, Michael Paquier wrote: > Could you add it to the next commit fest as a bug fix please? I think > that I will be able to look at that in details soon, but if not it > would be better to not lose track of your fix. Okay, I have looked at your patch. And double-checked on Windows. To put it short, I agree with the approach you are taking. I have been curious about the mention to MinGW in the existing code as well as in the patch you are proposing, and I have checked if what your patch and what the current state are correct, and I think that HEAD is wrong on one thing. First mingw64 code can be found here: https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/ https://github.com/Alexpux/mingw-w64/ Then, the implementation of readdir/opendir/closedir can be found in mingw-w64-crt/misc/dirent.c. Looking at their implementation of readdir, I can see two things: 1) When beginning a search in a directory, _tfindfirst gets used, which returns ENOENT as error if no matches are found: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/findfirst-functions?view=vs-2017 So from this point of view your patch is right: you make readdir() return errno=0 which matches the normal *nix behaviors. And MinGW does not do that. 2) On follow-up lookups, MinGW code uses _tfindnext, and actually *enforces* errno=0 when seeing ERROR_NO_MORE_FILES. So from this point of view the code's comment in HEAD is incorrect as of today. The current implementation exists since 399a36a so perhaps MinGW was not like that when dirent.c has been added in src/port/, but that's not true today. So let's fix the comment at the same time. Attached is an updated patch with my suggestions. Does it look fine to you? Also, I think that we should credit Yuri Kurenkov for the discovery of the issue, with yourself, Konstantin, as the author of the patch. Are there other people involved which should be credited? Like Grigory? -- Michael diff --git a/src/port/dirent.c b/src/port/dirent.c index 7a91450695..191fd062a5 100644 --- a/src/port/dirent.c +++ b/src/port/dirent.c @@ -83,7 +83,13 @@ readdir(DIR *d) d->handle = FindFirstFile(d->dirname, ); if (d->handle == INVALID_HANDLE_VALUE) { - errno = ENOENT; + if (GetLastError() == ERROR_FILE_NOT_FOUND) + { +/* No files at all, force errno=0 (unlike mingw) */ +errno = 0; +return NULL; + } + _dosmaperr(GetLastError()); return NULL; } } @@ -93,7 +99,7 @@ readdir(DIR *d) { if (GetLastError() == ERROR_NO_MORE_FILES) { -/* No more files, force errno=0 (unlike mingw) */ +/* No more files, force errno=0 (like mingw) */ errno = 0; return NULL; } signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
(2019/03/01 14:17), Tatsuro Yamada wrote: Attached patch is wip patch. Is it possible to remove the following patch? Because I registered the patch twice on CF Mar. https://commitfest.postgresql.org/22/2049/ Please remove the above and keep this: https://commitfest.postgresql.org/22/1779/ which I moved from the January CF to the March one on behalf of him. Best regards, Etsuro Fujita
Re: jsonpath
On Fri, Mar 1, 2019 at 3:36 AM Nikita Glukhov wrote: > I can also offset to explicitly pass timezone info into jsonpath function > using > the special user dataype encapsulating struct pg_tz. More interesting question is what would be the source of timezone. If even you encapsulate timezone in a separate datatype, the expression will be still just stable assuming timezone is generated by stable subexpression. What we actually need is immutable timezone. Day once timezone is updated, you create new timezone version, while old version is immutable. Then if jsonpath has given particular *timezone version*, it might remain immutable. But that requires significant rework of our timezone infrastructure. > But simple integer timezone offset can be passed now using jsonpath variables > (standard says only about integer timezone offsets; also it requires presence > of timezone offset it in the input string if the format string contain > timezone > components): > > =# SELECT jsonb_path_query( > '"28-02-2019 12:34"', > '$.datetime("DD-MM- HH24:MI TZH", $tz)', > jsonb_build_object('tz', EXTRACT(TIMEZONE FROM now())) >); > > jsonb_path_query > - > "2019-02-28T12:34:00+03:00" > (1 row) Standard specifies fixed offset to be given for *particular datetime*. For instance, if json contains offset in separate attribute or whatever, then it's OK to use such two-arguments .datetime() method. But that seems quite narrow use case. Standard doesn't mean you get fixed offset extracted from "now()" and apply it to random datetimes in your json collection. That would work correctly for real timezones only when they are fixed offsets, but there are almost none of them! So, that's just plain wrong, we never should encourage users to do something like this. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Remove Deprecated Exclusive Backup Mode
On Thu, Feb 28, 2019 at 7:51 PM Michael Paquier wrote: > On Thu, Feb 28, 2019 at 11:07:47PM -0300, Martín Marqués wrote: > > El 28/2/19 a las 15:13, David Steele escribió: > > + > > + The exclusive backup method is deprecated and should be avoided in > > favor > > + of the non-exclusive backup method or > > + pg_basebackup. > > + > > > > Isn't pg_basebackup a non-exclusive backup method? > > It seems to me that it is exactly what this sentence means. Perhaps > it should be splitted into two sentences for better clarity? Comments on the documentation patch should be placed on the thread linked to the documentation patch as opposed to here. David J.
Re: [HACKERS] Block level parallel vacuum
On Thu, Feb 28, 2019 at 2:44 AM Robert Haas wrote: > > On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada wrote: > > Thank you. Attached the rebased patch. > > Here are some review comments. Thank you for reviewing the patches! > > + started by a single utility command. Currently, the parallel > + utility commands that support the use of parallel workers are > + CREATE INDEX and VACUUM > + without FULL option, and only when building > + a B-tree index. Parallel workers are taken from the pool of > > That sentence is garbled. The end part about b-tree indexes applies > only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes. Fixed. > > + Vacuum index and cleanup index in parallel > + N background > workers (for the detail > + of each vacuum phases, please refer to linkend="vacuum-phases"/>. If the > > I have two problems with this. One is that I can't understand the > English very well. I think you mean something like: "Perform the > 'vacuum index' and 'cleanup index' phases of VACUUM in parallel using > N background workers," but I'm not entirely sure. The other is that > if that is what you mean, I don't think it's a sufficient description. > Users need to understand whether, for example, only one worker can be > used per index, or whether the work for a single index can be split > across workers. > > + parallel degree N > is omitted, > + then VACUUM decides the number of workers based on > + number of indexes on the relation which further limited by > + . Also if > this option > > Now this makes it sound like it's one worker per index, but you could > be more explicit about it. Fixed. > > + is specified multile times, the last parallel degree > + N is considered > into the account. > > Typo, but I'd just delete this sentence altogether; the behavior if > the option is multiply specified seems like a triviality that need not > be documented. Understood, removed. > > +Setting a value for parallel_workers via > + also controls how many parallel > +worker processes will be requested by a VACUUM > +against the table. This setting is overwritten by setting > +N of > PARALLEL > +option. > > I wonder if we really want this behavior. Should a setting that > controls the degree of parallelism when scanning the table also affect > VACUUM? I tend to think that we probably don't ever want VACUUM of a > table to be parallel by default, but rather something that the user > must explicitly request. Happy to hear other opinions. If we do want > this behavior, I think this should be written differently, something > like this: The PARALLEL N option to VACUUM takes precedence over this > option. For example, I can imagine a use case where a batch job does parallel vacuum to some tables in a maintenance window. The batch operation will need to compute and specify the degree of parallelism every time according to for instance the number of indexes, which would be troublesome. But if we can set the degree of parallelism for each tables it can just to do 'VACUUM (PARALLEL)'. > > + * parallel mode nor destories the parallel context. For updating the index > > Spelling. Fixed. > > +/* DSM keys for parallel lazy vacuum */ > +#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFF1) > +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFF2) > +#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFF3) > > Any special reason not to use just 1, 2, 3 here? The general > infrastructure stuff uses high numbers to avoid conflicting with > plan_node_id values, but end clients of the parallel infrastructure > can generally just use small integers. It seems that I was worrying unnecessarily, changed to 1, 2, 3. > > + bool updated; /* is the stats updated? */ > > is -> are > > + * LVDeadTuples controls the dead tuple TIDs collected during heap scan. > > what do you mean by "controls", exactly? stores? Fixed. > > + * This is allocated in a dynamic shared memory segment when parallel > + * lazy vacuum mode, or allocated in a local memory. > > If this is in DSM, then max_tuples is a wart, I think. We can't grow > the segment at that point. I'm suspicious that we need a better > design here. It looks like you gather all of the dead tuples in > backend-local memory and then allocate an equal amount of DSM to copy > them. But that means that we are using twice as much memory, which > seems pretty bad. You'd have to do that at least momentarily no > matter what, but it's not obvious that the backend-local copy is ever > freed. Hmm, the current design is more simple; only the leader process scans heap and save dead tuples TID to DSM. The DSM is allocated at once when starting lazy vacuum and we never need to enlarge DSM . Also we can use the same code around heap vacuum and collecting dead tuples for both single process vacuum and parallel vacuum. Once index
Re: Protect syscache from bloating with negative cache entries
Robert> This email thread is really short on clear demonstrations that X or Y Robert> is useful. It is useful when the whole database does **not** crash, isn't it? Case A (==current PostgeSQL mode): syscache grows, then OOMkiller chimes in, kills the database process, and it leads to the complete cluster failure (all other PG processes terminate themselves). Case B (==limit syscache by 10MiB or whatever as Tsunakawa, Takayuki asks): a single ill-behaved process works a bit slower and/or consumers more CPU than the other ones. The whole DB is still alive. I'm quite sure "case B" is much better for the end users and for the database administrators. So, +1 to Tsunakawa, Takayuki, it would be so great if there was a way to limit the memory consumption of a single process (e.g. syscache, workmem, etc, etc). Robert> However, memory usage is quite unpredictable. It depends on how many Robert> backends are active The number of backends can be limited by ensuring a proper limits at application connection pool level and/or pgbouncer and/or things like that. Robert>how many copies of work_mem and/or Robert> maintenance_work_mem are in use There might be other patches to cap the total use of work_mem/maintenance_work_mem, Robert>I don't think we Robert> can say that just imposing a limit on the size of the system caches is Robert> going to be enough to reliably prevent an out of memory condition The less possibilities there are for OOM the better. Quite often it is much better to fail a single SQL rather than kill all the DB processes. Vladimir
Re: [HACKERS] CLUSTER command progress monitor
Attached patch is wip patch. Is it possible to remove the following patch? Because I registered the patch twice on CF Mar. https://commitfest.postgresql.org/22/2049/ Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On 2019/02/23 6:02, Robert Haas wrote: On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada wrote: This patch is rebased on HEAD. I'll tackle revising the patch based on feedbacks next month. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command internally. See . It's not really true to say that VACUUM FULL uses the CLUSTER command internally. It's not really true. It uses a good chunk of the same infrastructure, but it certainly doesn't use the actual command, and it's not really the exact same thing either, because internally it's doing a sequential scan but no sort, which never happens with CLUSTER. I'm not sure exactly how to rephrase this, but I think we need to make it more precise. One idea is that maybe we should try to think of a design that could also handle the rewriting variants of ALTER TABLE, and call it pg_stat_progress_rewrite. Maybe that's moving the goalposts too far, but I'm not saying we'd necessarily have to do all the work now, just have a view that we think could also handle that. Then again, maybe the needs are too different. Hmm..., I see. If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of the implementation. For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose that it's tough work. + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + one row for each backend that is currently clustering or vacuuming (VACUUM FULL). That sentence contradicts itself. Just say that it contains a row for each backend that is currently running CLUSTER or VACUUM FULL. Fixed. @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple, void cluster(ClusterStmt *stmt, bool isTopLevel) { + if (stmt->relation != NULL) { /* This is the single-relation case. */ Useless hunk. Fixed. @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel) heap_close(rel, NoLock); /* Do the job. */ + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); cluster_rel(tableOid, indexOid, stmt->options); + pgstat_progress_end_command(); } else { It seems like that stuff should be inside cluster_rel(). Fixed. + /* Set reltuples to total_tuples */ + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES, OldHeap->rd_rel->reltuples); I object. If the user wants that, they can get it from pg_class themselves via an SQL query. It's also an estimate, not something we know to be accurate; I want us to only report facts here, not theories I understand that progress monitor should only report facts, so I removed that code. + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE, PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP); + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, PROGRESS_CLUSTER_METHOD_INDEX_SCAN); I think you should use pgstat_progress_update_multi_param() if updating multiple parameters at the same time. Also, some lines in this patch, such as this one, are very long. Consider techniques to reduce the line length to 80 characters or less, such as inserting a line break between the two arguments. Fixed. + /* Set scan_method to NULL */ + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1); NULL and -1 are not the same thing. Oops, fixed. I think that we shouldn't have both PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and PROGRESS_CLUSTER_PHASE_SCAN_HEAP. They're the same thing. Let's just use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both. Actually, better yet, let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP. That seems noticeably simpler. Fixed. I agree that it's acceptable to report PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I understand why it's valuable to do so in the context of a progress indicator. Actually, I'm not sure why I added it since so much time has passed. :( So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least. Attached patch is wip patch. Thanks! Tatsuro Yamada diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 0e73cdcdda..8cf829e72c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -344,6 +344,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER and VACUUM FULL, showing current progress. + See . + + + @@ -3376,9 +3384,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in
Re: pg_partition_tree crashes for a non-defined relation
Amit Langote writes: > On 2019/03/01 9:22, Michael Paquier wrote: >> What I am writing next sounds perhaps a bit fancy, but in my opinion a >> normal table is itself a partition tree, made of one single member: >> itself. > That's what we discussed, but it seems that we ended up allowing regular > standalone tables (possibly in inheritance trees) only because it > *appeared* to work. Alvaro already pointed out what appears to be a bug > in how we compute the value of level. Instead of trying to fix it, I > agree we should just disallow tables that are not a partitioned > table/index or a partition (relispartition). FWIW, I don't agree with Michael's suggestion above. A plain table is significantly different from a partitioned table with no children: you can store rows in the former but not the latter, and you can add partitions to the latter but not the former. So conflating the two doesn't seem likely to lead to any good outcome. But, having said that, we've learned that it's generally bad for catalog-query functions to fail outright just because they're pointed at the wrong kind of catalog object. So I think that what we want here is for pg_partition_tree to return NULL or an empty set or some such for a plain table, while its output for a childless partitioned table should be visibly different from that. I'm not wedded to details beyond that idea. regards, tom lane
Re: pg_partition_tree crashes for a non-defined relation
Hi, On 2019/03/01 9:22, Michael Paquier wrote: > On Thu, Feb 28, 2019 at 04:32:03PM -0300, Alvaro Herrera wrote: >> Yeah, looks good, please push. > > Done for this part. > >> I would opt for returning the empty set for legacy inheritance too. >> >> More generally, I think we should return empty for anything that's >> either not RELKIND_PARTITIONED_TABLE or has relispartition set. > > I think that one option is to make the function return only the table > itself if it is not a partitioned table, which would be more > consistent with what pg_partition_root() does. > > What I am writing next sounds perhaps a bit fancy, but in my opinion a > normal table is itself a partition tree, made of one single member: > itself. That's what we discussed, but it seems that we ended up allowing regular standalone tables (possibly in inheritance trees) only because it *appeared* to work. Alvaro already pointed out what appears to be a bug in how we compute the value of level. Instead of trying to fix it, I agree we should just disallow tables that are not a partitioned table/index or a partition (relispartition). Maybe there won't be any use cases, so we should change that while we still can. So, maybe change the check in check_rel_can_be_partition() as follows: relkind = get_rel_relkind(relid); relispartition = get_rel_relispartition(relid); /* Only allow relation types that can appear in partition trees. */ if (!relispartition && relkind != RELKIND_PARTITIONED_TABLE && relkind != RELKIND_PARTITIONED_INDEX) return false; Thanks, Amit
Re: Prevent extension creation in temporary schemas
Michael Paquier writes: > On Thu, Feb 28, 2019 at 10:13:17AM -0500, Tom Lane wrote: >> Yeah, I think it's just because we won't search the pg_temp schema >> for function or operator names, unless the calling SQL command >> explicitly writes "pg_temp.foo(...)" or equivalent. That's an >> ancient security decision, which we're unlikely to undo. It >> certainly puts a crimp in the usefulness of putting extensions into >> pg_temp, but I don't think it totally destroys the usefulness. >> You could still use an extension to package, say, the definitions >> of a bunch of temp tables and views that you need to create often. > Even with that, it should still be possible to enforce search_path > within the extension script to allow such objects to be created > correctly, no? That would be a bit hacky, still for the purpose of > temp object handling that looks kind of enough to live with when > creating an extension. If you're suggesting that we disable that security restriction during extension creation, I really can't see how that'd be a good thing ... regards, tom lane
Re: 2019-03 Starts Tomorrow
Michael Paquier writes: > So do we have anybody willing to take the glorious position of CFM for > this commit fest? IIRC, Steele already said he'd do it. regards, tom lane
RE: extension patch of CREATE OR REPLACE TRIGGER
> I've made a patch to add CREATE OR REPLACE TRIGGER with some basic tests in > triggers.sql. >> I see there are two patch entries in the commitfest for this. Is that a >> mistake? If so can you "Withdraw" one of them? Oh my bad. Sorry, this time was my first time to register my patch ! Please withdraw the old one, "extension patch to add OR REPLACE clause to CREATE TRIGGER". My latest version is "extension patch of CREATE OR REPLACE TRIGGER". Thanks ! -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: propagating replica identity to partitions
On Thu, Feb 28, 2019 at 07:41:11PM -0300, Alvaro Herrera wrote: > We all seem to agree that REPLICA IDENTITY should recurse. (entering the ring) FWIW, I agree that having REPLICA IDENTITY recurse on partitions feels more natural, as much as being able to use ALTER TABLE ONLY to only update the definition on a given partition member. -- Michael signature.asc Description: PGP signature
Re: Remove Deprecated Exclusive Backup Mode
On Thu, Feb 28, 2019 at 11:07:47PM -0300, Martín Marqués wrote: > El 28/2/19 a las 15:13, David Steele escribió: > + > + The exclusive backup method is deprecated and should be avoided in > favor > + of the non-exclusive backup method or > + pg_basebackup. > + > > Isn't pg_basebackup a non-exclusive backup method? It seems to me that it is exactly what this sentence means. Perhaps it should be splitted into two sentences for better clarity? -- Michael signature.asc Description: PGP signature
Re: 2019-03 Starts Tomorrow
Hi David, On Thu, Feb 28, 2019 at 11:05:33AM +0200, David Steele wrote: > The 2019-03 CF is almost upon us. The CF will officially start at 00:00 AoE > (12:00 UTC) on Friday, March 1st. Thanks for the reminder. > If you have a patch that has been Waiting on Author without any discussion > since before the last CF ended then you should submit a new patch for the > start of the 2019-03 CF. So do we have anybody willing to take the glorious position of CFM for this commit fest? -- Michael signature.asc Description: PGP signature
Re: Prevent extension creation in temporary schemas
On Thu, Feb 28, 2019 at 10:13:17AM -0500, Tom Lane wrote: > Yeah, I think it's just because we won't search the pg_temp schema > for function or operator names, unless the calling SQL command > explicitly writes "pg_temp.foo(...)" or equivalent. That's an > ancient security decision, which we're unlikely to undo. It > certainly puts a crimp in the usefulness of putting extensions into > pg_temp, but I don't think it totally destroys the usefulness. > You could still use an extension to package, say, the definitions > of a bunch of temp tables and views that you need to create often. Even with that, it should still be possible to enforce search_path within the extension script to allow such objects to be created correctly, no? That would be a bit hacky, still for the purpose of temp object handling that looks kind of enough to live with when creating an extension. -- Michael signature.asc Description: PGP signature
Tighten error control for OpenTransientFile/CloseTransientFile
Hi all, Joe's message here has reminded me that we have lacked a lot of error handling around CloseTransientFile(): https://www.postgresql.org/message-id/c49b69ec-e2f7-ff33-4f17-0eaa4f2ce...@joeconway.com This has been mentioned by Alvaro a couple of months ago (cannot find the thread about that at quick glance), and I just forgot about it at that time. Anyway, attached is a patch to do some cleanup for all that: - Switch OpenTransientFile to read-only where sufficient. - Add more error handling for CloseTransientFile A major take of this patch is to make sure that the new error messages generated have an elevel consistent with their neighbors. Just on time for this last CF. Thoughts? -- Michael diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 9905593661..7b39283c89 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -1991,7 +1991,10 @@ qtext_load_file(Size *buffer_size) return NULL; } - CloseTransientFile(fd); + if (CloseTransientFile(fd)) + ereport(LOG, +(errcode_for_file_access(), + errmsg("could not close file \"%s\": %m", PGSS_TEXT_FILE))); *buffer_size = stat.st_size; return buf; diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index f5cf9ffc9c..bce4274362 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r) errmsg("could not fsync file \"%s\": %m", path))); pgstat_report_wait_end(); - CloseTransientFile(fd); + if (CloseTransientFile(fd)) + ereport(ERROR, +(errcode_for_file_access(), + errmsg("could not close file \"%s\": %m", path))); } /* --- @@ -1300,7 +1303,11 @@ CheckPointLogicalRewriteHeap(void) (errcode_for_file_access(), errmsg("could not fsync file \"%s\": %m", path))); pgstat_report_wait_end(); - CloseTransientFile(fd); + + if (CloseTransientFile(fd)) +ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not close file \"%s\": %m", path))); } } FreeDir(mappings_dir); diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c index 3623352b9c..974d42fc86 100644 --- a/src/backend/access/transam/slru.c +++ b/src/backend/access/transam/slru.c @@ -599,7 +599,7 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno) SlruFileName(ctl, path, segno); - fd = OpenTransientFile(path, O_RDWR | PG_BINARY); + fd = OpenTransientFile(path, O_RDONLY | PG_BINARY); if (fd < 0) { /* expected: file doesn't exist */ @@ -621,7 +621,13 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno) result = endpos >= (off_t) (offset + BLCKSZ); - CloseTransientFile(fd); + if (CloseTransientFile(fd)) + { + slru_errcause = SLRU_CLOSE_FAILED; + slru_errno = errno; + return false; + } + return result; } @@ -654,7 +660,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno) * SlruPhysicalWritePage). Hence, if we are InRecovery, allow the case * where the file doesn't exist, and return zeroes instead. */ - fd = OpenTransientFile(path, O_RDWR | PG_BINARY); + fd = OpenTransientFile(path, O_RDONLY | PG_BINARY); if (fd < 0) { if (errno != ENOENT || !InRecovery) diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c index c96c8b60ba..cbd9b2cee1 100644 --- a/src/backend/access/transam/timeline.c +++ b/src/backend/access/transam/timeline.c @@ -370,7 +370,11 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI, } pgstat_report_wait_end(); } - CloseTransientFile(srcfd); + + if (CloseTransientFile(srcfd)) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not close file \"%s\": %m", path))); } /* @@ -416,7 +420,6 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI, (errcode_for_file_access(), errmsg("could not close file \"%s\": %m", tmppath))); - /* * Now move the completed history file into place with its final name. */ @@ -495,7 +498,6 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size) (errcode_for_file_access(), errmsg("could not close file \"%s\": %m", tmppath))); - /* * Now move the completed history file into place with its final name. */ diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 64679dd2de..21986e48fe 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok) } pgstat_report_wait_end(); - CloseTransientFile(fd); + + if (CloseTransientFile(fd)) + ereport(ERROR, +(errcode_for_file_access(), + errmsg("could not close file \"%s\": %m", path))); hdr = (TwoPhaseFileHeader *) buf; if (hdr->magic != TWOPHASE_MAGIC) diff --git
Re: NOT IN subquery optimization
On Tue, Feb 26, 2019 at 6:51 AM Li, Zheng wrote: > Resend the patch with a whitespace removed so that "git apply patch" works > directly. > > Hi Zheng, I have reviewed your patch. Good job except two issues I can find: 1. The patch would give wrong results when the inner side is empty. In this case, the whole data from outer side should be in the outputs. But with the patch, we will lose the NULLs from outer side. 2. Because of the new added predicate 'OR (var is NULL)', we cannot use hash join or merge join to do the ANTI JOIN. Nested loop becomes the only choice, which is low-efficency. Thanks Richard
Fix memleaks and error handling in jsonb_plpython
Unfortunately, contrib/jsonb_plpython still contain a lot of problems in error handling that can lead to memory leaks: - not all Python function calls are checked for the success - not in all places PG exceptions are caught to release Python references But it seems that this errors can happen only in OOM case. Attached patch with the fix. Back-patch for PG11 is needed. -- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From e40a9012c38c3b66888791d3dd3943adf9f310c8 Mon Sep 17 00:00:00 2001 From: Nikita Glukhov Date: Tue, 15 Jan 2019 02:14:06 +0300 Subject: [PATCH] Fix memleaks and error handling in jsonb_plpython --- contrib/jsonb_plpython/jsonb_plpython.c | 146 ++-- 1 file changed, 100 insertions(+), 46 deletions(-) diff --git a/contrib/jsonb_plpython/jsonb_plpython.c b/contrib/jsonb_plpython/jsonb_plpython.c index f44d364..3143b73 100644 --- a/contrib/jsonb_plpython/jsonb_plpython.c +++ b/contrib/jsonb_plpython/jsonb_plpython.c @@ -169,53 +169,80 @@ PLyObject_FromJsonbContainer(JsonbContainer *jsonb) if (!result) return NULL; -while ((r = JsonbIteratorNext(, , true)) != WJB_DONE) +PG_TRY(); { - if (r == WJB_ELEM) + while ((r = JsonbIteratorNext(, , true)) != WJB_DONE) { - PyObject *elem = PLyObject_FromJsonbValue(); + if (r == WJB_ELEM) + { + PyObject *elem = PLyObject_FromJsonbValue(); + + if (!elem || PyList_Append(result, elem)) + { +Py_XDECREF(elem); +Py_DECREF(result); +return NULL; + } - PyList_Append(result, elem); - Py_XDECREF(elem); + Py_DECREF(elem); + } } } +PG_CATCH(); +{ + Py_DECREF(result); + PG_RE_THROW(); +} +PG_END_TRY(); } break; case WJB_BEGIN_OBJECT: - result = PyDict_New(); - if (!result) -return NULL; - - while ((r = JsonbIteratorNext(, , true)) != WJB_DONE) { -if (r == WJB_KEY) -{ - PyObject *key = PLyString_FromJsonbValue(); - - if (!key) - return NULL; +PyObject *volatile key = NULL; - r = JsonbIteratorNext(, , true); +result = PyDict_New(); +if (!result) + return NULL; - if (r == WJB_VALUE) +PG_TRY(); +{ + while ((r = JsonbIteratorNext(, , true)) != WJB_DONE) { - PyObject *value = PLyObject_FromJsonbValue(); + JsonbValue v2; + PyObject *val = NULL; + + if (r != WJB_KEY) + continue; + + if ((r = JsonbIteratorNext(, , true)) != WJB_VALUE) + elog(ERROR, "unexpected jsonb token: %d", r); - if (!value) + if (!(key = PLyString_FromJsonbValue()) || + !(val = PLyObject_FromJsonbValue()) || + PyDict_SetItem(result, key, val)) { Py_XDECREF(key); + Py_XDECREF(val); + Py_DECREF(result); return NULL; } - PyDict_SetItem(result, key, value); - Py_XDECREF(value); + Py_DECREF(val); + Py_DECREF(key); + key = NULL; } - +} +PG_CATCH(); +{ Py_XDECREF(key); + Py_DECREF(result); + PG_RE_THROW(); } +PG_END_TRY(); + +break; } - break; default: elog(ERROR, "unexpected jsonb token: %d", r); @@ -233,30 +260,40 @@ PLyObject_FromJsonbContainer(JsonbContainer *jsonb) static JsonbValue * PLyMapping_ToJsonbValue(PyObject *obj, JsonbParseState **jsonb_state) { - Py_ssize_t pcount; - JsonbValue *out = NULL; + JsonbValue *out; + PyObject *items; + Py_ssize_t pcount = PyMapping_Size(obj); + + if (pcount < 0) + PLy_elog(ERROR, "PyMapping_Size() failed, while converting mapping into jsonb"); - /* We need it volatile, since we use it after longjmp */ - volatile PyObject *items_v = NULL; + items = PyMapping_Items(obj); - pcount = PyMapping_Size(obj); - items_v = PyMapping_Items(obj); + if (!items) + PLy_elog(ERROR, "PyMapping_Items() failed, while converting mapping into jsonb"); PG_TRY(); { Py_ssize_t i; - PyObject *items; - - items = (PyObject *) items_v; pushJsonbValue(jsonb_state, WJB_BEGIN_OBJECT, NULL); for (i = 0; i < pcount; i++) { JsonbValue jbvKey; - PyObject *item = PyList_GetItem(items, i); - PyObject *key = PyTuple_GetItem(item, 0); - PyObject *value = PyTuple_GetItem(item, 1); + PyObject *item; + PyObject *key; + PyObject *value; + + item = PyList_GetItem(items, i); /* borrowed reference */ + if (!item) +PLy_elog(ERROR, "PyList_GetItem() failed, while converting mapping into jsonb"); + + key = PyTuple_GetItem(item, 0); /* borrowed references */ + value = PyTuple_GetItem(item, 1); + + if (!key || !value) +PLy_elog(ERROR, "PyTuple_GetItem() failed, while converting mapping into jsonb"); /* Python dictionary can have None as key */ if (key == Py_None) @@ -279,11 +316,13 @@ PLyMapping_ToJsonbValue(PyObject *obj, JsonbParseState **jsonb_state) } PG_CATCH(); { -
Re: pgsql: Avoid creation of the free space map for small heap relations, t
On Thu, Feb 28, 2019 at 9:59 AM John Naylor wrote: > > On Thu, Feb 28, 2019 at 10:25 AM Amit Kapila wrote: > > > > Here's an updated patch based on comments by you. I will proceed with > > this unless you have any more comments. > > Looks good to me. I would just adjust the grammar in the comment, from > "This prevents us to use the map", to "This prevents us from using the > map". Perhaps also a comma after "first". > Okay, pushed after making changes suggested by you. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Remove Deprecated Exclusive Backup Mode
El 28/2/19 a las 15:13, David Steele escribió: > > It seems to me that the best way to discuss this is via a patch to the > main documentation. I've written something to get us started: > > https://commitfest.postgresql.org/22/2042/ + + The exclusive backup method is deprecated and should be avoided in favor + of the non-exclusive backup method or + pg_basebackup. + Isn't pg_basebackup a non-exclusive backup method? -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: proposal: variadic argument support for least, greatest function
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed For completeness, I'll mark this reviewed again. It passes installcheck-world, the latest patch addresses the suggestions from me, and is improved on the code-structure matters that Tom pointed out. I don't know if it will meet Tom's threshold for desirability overall, but that sounds like a committer call at this point, so I'll change it to RfC. -Chap The new status of this patch is: Ready for Committer
RE: Protect syscache from bloating with negative cache entries
From: Ideriha, Takeshi/出利葉 健 > [Size=800, iter=1,000,000] > Master |15.763 > Patched|16.262 (+3%) > > [Size=32768, iter=1,000,000] > Master |61.3076 > Patched|62.9566 (+2%) What's the unit, second or millisecond? Why is the number of digits to the right of the decimal point? Is the measurement correct? I'm wondering because the difference is larger in the latter case. Isn't the accounting processing almost the sane in both cases? * former: 16.262 - 15.763 = 4.99 * latter: 62.956 - 61.307 = 16.49 > At least compared to previous HashAg version, the overhead is smaller. > It has some overhead but is increase by 2 or 3% a little bit? I think the overhead is sufficiently small. It may get even smaller with a trivial tweak. You added the new member usedspace at the end of MemoryContextData. The original size of MemoryContextData is 72 bytes, and Intel Xeon's cache line is 64 bytes. So, the new member will be on a separate cache line. Try putting usedspace before the name member. Regards Takayuki Tsunakawa
Re: FETCH FIRST clause PERCENT option
Hello. At Thu, 28 Feb 2019 21:16:25 +0100, Tomas Vondra wrote in > > One biggest issue seems to be we don't know the total number of # One *of* the biggest *issues*? > > outer tuples before actually reading a null tuple. I doubt of > > general shortcut for that. It also seems preventing limit node > > from just using materialized outer. > > > > Sure, if you actually want all tuples, you'll have to execute the outer > plan till completion. But that's not what I'm talking about - what if we > only ever need to read one row from the limit? We have no choice than once reading all tuples just to find we are to return just one row, since estimator is not guaranteed to be exact as required for this purpose. > To give you a (admittedly, somewhat contrived and artificial example): > > SELECT * FROM t1 WHERE id IN ( > SELECT id FROM t2 ORDER BY x FETCH FIRST 10 PERCENT ROWS ONLY > ); > > Maybe this example is bogus and/or does not really matter in practice. I > don't know, but I've been unable to convince myself that's the case. I see such kind of idiom common. Even in the quite simple example above, *we* cannot tell how many tuples the inner should return unless we actually fetch all tuples in t2. This is the same problem with count(*). The query is equivalent to the folloing one. SELECT * FROM t1 WHERE id IN ( SELECT id FROM t2 ORDER BY x FETCH FIRST (SELECT ceil(count(*) * 0.1) FROM t2) ROWS ONLY ); This scans t2 twice, but this patch does only one full scan moving another partial scan to tuplestore. We would win if the outer is complex enough. Anyway, even excluding the count(*) issue, it seems that we are not alone in that point. (That is, at least Oracle shows different E-Rows and A-Rows for PERCENT). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context
Hi, thanks for checking the patches! On 02/28/19 19:36, Ramanarayana wrote: > The below statement needs to be executed before running the query to > replicate the issue > > update xmldata set data = regexp_replace(data::text, '791', > '791')::xml; If you are applying that update (and there is a SIZE element originally 791), and then receiving a "more than one value returned by column XPath expression" error, I believe you are seeing documented, correct behavior. Your update changes the content of that SIZE element to have three comment nodes and three text nodes. The query then contains this column spec: size_text float PATH 'SIZE/text()' where the target SQL column type is 'float' and the path expression will return an XML result consisting of the three text nodes. As documented, "An XML result assigned to a column of any other type may not have more than one node, or an error is raised." So I think this behavior is correct. If you do any more testing (thank you for taking the interest, by the way!), could you please add your comments, not to this email thread, but to [1]? [1] https://www.postgresql.org/message-id/3e8eab9e-7289-6c23-5e2c-153cccea2257%40anastigmatix.net That's the one that is registered to the commitfest entry, so comments made on this thread might be overlooked. Thanks! -Chap
Re: get_controlfile() can leak fds in the backend
Hi, On 2019-03-01 10:11:53 +0900, Michael Paquier wrote: > One thing is that we don't protect a data folder to be started when it > is in the process of being treated by an external tool, like > pg_rewind, or pg_checksums. So having an extra flag in the control > file, which can be used by external tools to tell that the data folder > is being treated for something does not sound that crazy to me. > Having a tool write a fake postmaster.pid for this kind of task does > not sound right from a correctness point of view, because the instance > is not started. I think putting this into the control file is a seriously bad idea. Postmaster interlocks against other postmasters running via postmaster.pid. Having a second interlock mechanism, in a different file, doesn't make any sort of sense. Nor does it seem sane to have external tool write over data as INTENSELY critical as the control file, when they then have to understand CRCs etc. > Let's keep the discussions where they are by the way. Joe has just > closed the report of this thread with 4598a99, so I am moving on to > the correct places. I don't know what that means, given you replied to the above in this thread? Greetings, Andres Freund
Re: get_controlfile() can leak fds in the backend
On Thu, Feb 28, 2019 at 01:07:23PM -0800, Andres Freund wrote: > Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to > see if the backend has finished starting up etc. It's precisely what the > backend uses to prevent two postmasters to start etc. It's also what say > pg_resetwal checks to protect against a concurrently running lcuster > (albeit in a racy way). If we want to make things more bulletproof, > that's the place. The control file is constantly written to, sometimes > by different processes, it'd just not be a good file for such lockout > mechanisms. Hijacking more my own thread... I can do that, right? Or not. One thing is that we don't protect a data folder to be started when it is in the process of being treated by an external tool, like pg_rewind, or pg_checksums. So having an extra flag in the control file, which can be used by external tools to tell that the data folder is being treated for something does not sound that crazy to me. Having a tool write a fake postmaster.pid for this kind of task does not sound right from a correctness point of view, because the instance is not started. And a lock API won't protect much if the host is unplugged as well if its state is made durable... Let's keep the discussions where they are by the way. Joe has just closed the report of this thread with 4598a99, so I am moving on to the correct places. My apologies for the digressions. -- Michael signature.asc Description: PGP signature
Re: get_controlfile() can leak fds in the backend
On Thu, Feb 28, 2019 at 04:09:32PM -0500, Joe Conway wrote: > Committed and push that way. Thanks for committing a fix. > By the way, while looking at this, I noted at least a couple of places > where OpenTransientFile() is being passed O_RDWR when the usage is > pretty clearly intended to be read-only. For example at least two > instances in slru.c -- SlruPhysicalReadPage() and > SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and > fixing those instances? There are roughly 40~42 callers of OpenTransientFile(). Looking at them I can see that RestoreSlotFromDisk() could also switch to RDONLY instead of RDWR. I am also a bit tired of the lack error handling around CloseTransientFile(). While in some code paths the file descriptors are closed for an error, in some others we should report something. I am going to send a patch after a lookup. Let's see all that on a separate thread. -- Michael signature.asc Description: PGP signature
Re: jsonpath
On 2019-03-01 03:36:49 +0300, Nikita Glukhov wrote: > Patch 1 is what we are going to commit in PG12. I think it's too early to make that determination. I think there's a good chance, but that this needs more independent review.
Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context
Hi, The below statement needs to be executed before running the query to replicate the issue update xmldata set data = regexp_replace(data::text, '791', '791')::xml; On Thu, 28 Feb 2019 at 17:55, Pavel Stehule wrote: > > > čt 28. 2. 2019 v 10:31 odesílatel Pavel Stehule > napsal: > >> >> >> čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana >> napsal: >> >>> Hi, >>> >>> I have tested the three issues fixed in patch 001. Array Indexes >>> issue is still there.Running the following query returns ERROR: more >>> than one value returned by column XPath expression >>> >>> SELECT xmltable.* >>> FROM (SELECT data FROM xmldata) x, >>> LATERAL XMLTABLE('/ROWS/ROW' >>> PASSING data >>> COLUMNS >>> country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, >>> size_text float PATH 'SIZE/text()', >>> size_text_1 float PATH 'SIZE/text()[1]', >>> size_text_2 float PATH 'SIZE/text()[2]', >>> "SIZE" float, size_xml xml PATH 'SIZE') >>> >>> The other two issues are resolved by this patch. >>> >> > I tested xmltable-xpath-result-processing-bugfix-6.patch > > and it is working > > postgres=# SELECT xmltable.* > postgres-#FROM (SELECT data FROM xmldata) x, > postgres-# LATERAL XMLTABLE('/ROWS/ROW' > postgres(# PASSING data > postgres(# COLUMNS > postgres(# country_name text PATH > 'COUNTRY_NAME/text()' NOT NULL, > postgres(# size_text float PATH > 'SIZE/text()', > postgres(# size_text_1 float PATH > 'SIZE/text()[1]', > postgres(# size_text_2 float PATH > 'SIZE/text()[2]', > postgres(# "SIZE" float, size_xml xml > PATH 'SIZE') ; > ┌──┬───┬─┬─┬──┬┐ > > │ country_name │ size_text │ size_text_1 │ size_text_2 │ SIZE │ > size_xml │ > ╞══╪═══╪═╪═╪══╪╡ > > │ Australia│ ∅ │ ∅ │ ∅ │∅ │ > ∅ │ > │ China│ ∅ │ ∅ │ ∅ │∅ │ > ∅ │ > │ HongKong │ ∅ │ ∅ │ ∅ │∅ │ > ∅ │ > │ India│ ∅ │ ∅ │ ∅ │∅ │ > ∅ │ > │ Japan│ ∅ │ ∅ │ ∅ │∅ │ > ∅ │ > │ Singapore│ 791 │ 791 │ ∅ │ 791 │ unit="km">791 │ > └──┴───┴─┴─┴──┴┘ > > (6 rows) > > Regards > > Pavel > > >> >> what patches you are used? >> >> Regards >> >> Pavel >> >> >>> -- >>> Cheers >>> Ram 4.0 >>> >> -- Cheers Ram 4.0
Re: pg_partition_tree crashes for a non-defined relation
On Thu, Feb 28, 2019 at 04:32:03PM -0300, Alvaro Herrera wrote: > Yeah, looks good, please push. Done for this part. > I would opt for returning the empty set for legacy inheritance too. > > More generally, I think we should return empty for anything that's > either not RELKIND_PARTITIONED_TABLE or has relispartition set. I think that one option is to make the function return only the table itself if it is not a partitioned table, which would be more consistent with what pg_partition_root() does. What I am writing next sounds perhaps a bit fancy, but in my opinion a normal table is itself a partition tree, made of one single member: itself. -- Michael signature.asc Description: PGP signature
Re: Online verification of checksums
Hi, Am Donnerstag, den 28.02.2019, 14:29 +0100 schrieb Fabien COELHO: > > So I have now changed behaviour so that short writes count as skipped > > files and pg_verify_checksums no longer bails out on them. When this > > occors a warning is written to stderr and their overall count is also > > reported at the end. However, unless there are other blocks with bad > > checksums, the exit status is kept at zero. > > This seems fair when online, however I'm wondering whether it is when > offline. I'd say that the whole retry logic should be skipped in this > case? i.e. "if (block_retry || !online) { error message and continue }" > on both short read & checksum failure retries. Ok, the stand-alone pg_checksums program also got a PR about the LSN skip logic not being helpful when the instance is offline and somebody just writes /dev/urandom over the heap files: https://github.com/credativ/pg_checksums/pull/6 So I now tried to change the patch so that it only retries blocks when online. > Patch applies cleanly, compiles, global & local make check ok. > > I'm wondering whether it should exit(1) on "lseek" failures. Would it make > sense to skip the file and report it as such? Should it be counted as a > skippedfile? Ok, I think it makes sense to march on and I changed it that way. > WRT the final status, ISTM that slippedblocks & files could warrant an > error when offline, although they might be ok when online? Ok, also changed it that way. New patch attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml index 905b8f1222..4ad6edcde6 100644 --- a/doc/src/sgml/ref/pg_verify_checksums.sgml +++ b/doc/src/sgml/ref/pg_verify_checksums.sgml @@ -37,9 +37,8 @@ PostgreSQL documentation Description pg_verify_checksums verifies data checksums in a - PostgreSQL cluster. The server must be shut - down cleanly before running pg_verify_checksums. - The exit status is zero if there are no checksum errors, otherwise nonzero. + PostgreSQL cluster. The exit status is zero if + there are no checksum errors, otherwise nonzero. diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c index 511262ab5f..7f358988ed 100644 --- a/src/bin/pg_verify_checksums/pg_verify_checksums.c +++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c @@ -1,7 +1,7 @@ /* * pg_verify_checksums * - * Verifies page level checksums in an offline cluster + * Verifies page level checksums in a cluster * * Copyright (c) 2010-2019, PostgreSQL Global Development Group * @@ -24,12 +24,16 @@ static int64 files = 0; +static int64 skippedfiles = 0; static int64 blocks = 0; static int64 badblocks = 0; +static int64 skippedblocks = 0; static ControlFileData *ControlFile; +static XLogRecPtr checkpointLSN; static char *only_relfilenode = NULL; static bool verbose = false; +static bool online = false; static const char *progname; @@ -86,10 +90,17 @@ scan_file(const char *fn, BlockNumber segmentno) PageHeader header = (PageHeader) buf.data; int f; BlockNumber blockno; + bool block_retry = false; f = open(fn, O_RDONLY | PG_BINARY, 0); if (f < 0) { + if (online && errno == ENOENT) + { + /* File was removed in the meantime */ + return; + } + fprintf(stderr, _("%s: could not open file \"%s\": %s\n"), progname, fn, strerror(errno)); exit(1); @@ -104,26 +115,123 @@ scan_file(const char *fn, BlockNumber segmentno) if (r == 0) break; + if (r < 0) + { + fprintf(stderr, _("%s: could not read block %u in file \"%s\": %s\n"), + progname, blockno, fn, strerror(errno)); + return; + } if (r != BLCKSZ) { - fprintf(stderr, _("%s: could not read block %u in file \"%s\": read %d of %d\n"), - progname, blockno, fn, r, BLCKSZ); - exit(1); + if (online) + { +if (block_retry) +{ + /* We already tried once to reread the block, skip it */ + skippedfiles++; + fprintf(stderr, _("%s: warning: could not read block %u in file \"%s\": read %d of %d\n"), + progname, blockno, fn, r, BLCKSZ); + continue; +} + +/* + * Retry the block. It's possible that we read the block while it + * was extended or shrinked, so it it ends up looking torn to us. + */ + +/* + * Seek back by the amount of bytes we read to the beginning of + * the failed block. + */ +if (lseek(f, -r, SEEK_CUR) == -1) +{ + skippedfiles++; + fprintf(stderr, _("%s: could not lseek in file
Re: Why don't we have a small reserved OID range for patch revisions?
On Thu, Feb 28, 2019 at 3:40 PM Tom Lane wrote: > Robert Haas writes: > >>> just as a thought, what if we stopped assigning manual OIDs for new > >>> catalog entries altogether, except for once at the end of each release > >>> cycle? > > Actually ... that leads to an idea that wouldn't add any per-commit > overhead, or really much change at all to existing processes. Given > the existence of a reliable OID-renumbering tool, we could: > In this scheme, OID collisions are a problem for in-progress patches > only if two patches are unlucky enough to choose the same random > high OIDs during the same devel cycle. That's unlikely, or at least > a good bit less likely than collisions are today. That sounds like a reasonable compromise. Perhaps the unused_oids script could give specific guidance on using a randomly determined small range of contiguous OIDs that fall within the current range for that devel cycle. That would prevent collisions caused by the natural human tendency to prefer a round number. Having contiguous OIDs for the same patch seems worth preserving. -- Peter Geoghegan
Re: Why don't we have a small reserved OID range for patch revisions?
On 3/1/19 12:41 AM, Peter Geoghegan wrote: > On Thu, Feb 28, 2019 at 3:09 PM Tom Lane wrote: >> The only thing that's really clear is that some senior committers don't >> want to be bothered because they don't think there's a problem here that >> justifies any additional expenditure of their time. Perhaps they are >> right, because I'd expected some comments from non-committer developers >> confirming that they see a problem, and the silence is deafening. > > I don't think that you can take that as too strong a signal. The > incentives are different for non-committers. > >> I'm inclined to commit some form of Naylor's tool improvement anyway, >> because I have use for it; I remember times when I've renumbered OIDs >> manually in patches, and it wasn't much fun. But I can't force a >> process change if there's not consensus for it among the committers. > > I think that that's a reasonable thing to do, provided there is > obvious feedback that makes it highly unlikely that the committer will > make an error at the last moment. I have a hard time coming up with a > suggestion that won't be considered annoying by at least one person, > though. > FWIW I personally would not mind if such tool / process was added. But I have a related question - do we have some sort of list of such processes that I could check? That is, a list of stuff that is expected to be done by a committer before a commit? I do recall we have [1], but perhaps we have something else. https://wiki.postgresql.org/wiki/Committing_checklist regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Why don't we have a small reserved OID range for patch revisions?
On Thu, Feb 28, 2019 at 3:09 PM Tom Lane wrote: > The only thing that's really clear is that some senior committers don't > want to be bothered because they don't think there's a problem here that > justifies any additional expenditure of their time. Perhaps they are > right, because I'd expected some comments from non-committer developers > confirming that they see a problem, and the silence is deafening. I don't think that you can take that as too strong a signal. The incentives are different for non-committers. > I'm inclined to commit some form of Naylor's tool improvement anyway, > because I have use for it; I remember times when I've renumbered OIDs > manually in patches, and it wasn't much fun. But I can't force a > process change if there's not consensus for it among the committers. I think that that's a reasonable thing to do, provided there is obvious feedback that makes it highly unlikely that the committer will make an error at the last moment. I have a hard time coming up with a suggestion that won't be considered annoying by at least one person, though. Would it be awful if there was a #warning directive that kicked in when the temporary OID range is in use? It should be possible to do that without breaking -Werror builds, which I believe Robert uses (I am reminded of the Flex bug that we used to have to work around). It's not like there are that many patches that need to assign OIDs to new catalog entries. I would suggest that we put the warning in the regression tests if I didn't know that that could be missed by the use of parallel variants, where the output flies by. There is no precedent for using #warning for something like that, but offhand it seems like the only thing that would work consistently. I don't really mind having to do slightly more work when the issue crops up, especially if that means less work for everyone involved in aggregate, which is the cost that I'm concerned about the most. However, an undocumented or under-documented process that requires a fixed amount of extra mental effort when committing *anything* is another matter. -- Peter Geoghegan
Re: Why don't we have a small reserved OID range for patch revisions?
Robert Haas writes: >>> just as a thought, what if we stopped assigning manual OIDs for new >>> catalog entries altogether, except for once at the end of each release >>> cycle? Actually ... that leads to an idea that wouldn't add any per-commit overhead, or really much change at all to existing processes. Given the existence of a reliable OID-renumbering tool, we could: 1. Encourage people to develop new patches using chosen-at-random high OIDs, in the 7K-9K range. They do this already, it'd just be encouraged instead of discouraged. 2. Commit patches as received. 3. Once each devel cycle, after feature freeze, somebody uses the renumbering tool to shove all the new OIDs down to lower numbers, freeing the high-OID range for the next devel cycle. We'd have to remember to do that, but it could be added to the RELEASE_CHANGES checklist. In this scheme, OID collisions are a problem for in-progress patches only if two patches are unlucky enough to choose the same random high OIDs during the same devel cycle. That's unlikely, or at least a good bit less likely than collisions are today. If/when it does happen we'd have a couple of alternatives for ameliorating the problem --- either the not-yet-committed patch could use the renumbering tool on their own OIDs, or we could do an off-schedule run of step 3 to get the already-committed OIDs out of their way. regards, tom lane
Re: propagating replica identity to partitions
Added Peter E to CC; question at the very end. On 2019-Feb-20, Robert Haas wrote: > Yeah, we could. I wonder, though, whether we should just make > everything recurse. I think that's what people are commonly going to > want, at least for partitioned tables, and it doesn't seem to me that > it would hurt anything to make the inheritance case work that way, > too. Right now it looks like we have this list of exceptions: > > - actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY) > - TRIGGER > - CLUSTER > - OWNER > - TABLESPACE never recurse to descendant tables > - Adding a constraint recurses only for CHECK constraints that are not > marked NO INHERIT. > > I have a feeling we eventually want this list to be empty, right? We > want a partitioned table to work as much like a non-partitioned table > as possible, unless the user asks for some other behavior. Going from > six exceptions to four and maybe having some of them depend on whether > it's partitioning or inheritance doesn't seem like it's as good and > clear as trying to adopt a really uniform policy. > > I don't buy Simon's argument that we should treat TABLESPACE > differently because the tables might be really big and take a long > time to move. I agree that this could well be true, but nobody is > proposing to remove the ability to move tables individually or to use > ONLY here. Allowing TABLESPACE to recurse just gives people one > additional choice that they don't have today: to move everything at > once. We don't lose any functionality by enabling that. Tablespaces already behave a little bit especially in another sense: it doesn't recurse to indexes. I think it's not possible to implement a three-way flag, where you tell it to move only the table, or to recurse to child tables but leave local indexes alone, or just to move everything. If we don't move the child indexes, are we really recursing in that sense? I think changing the behavior of tablespaces is likely to cause pain for people with existing scripts that expect the command not to recurse. We would have to add a switch of some kind to provide the old behavior in order not to break those, and that doesn't seem so good either. I agree we definitely want to treat a partitioned table as similarly as possible to a non-partitioned table, but in the particular case of tablespace it seems to me better not to mess with the behavior. CLUSTER: I'm not sure about this one. For legacy inheritance there's no principled way to figure out the right index to recurse to. For partitioning that seems a very simple problem, but I do wonder if there's any use case at all for ALTER TABLE CLUSTER there. My inclination would be to reject ALTER TABLE CLUSTER on a partitioned table altogether, if it isn't already. OWNER: let's just change this one to recurse always. I think we have a consensus about this one. TRIGGER: I think it would be good to recurse, based on the trigger name. I'm not sure what to do about legacy inheritance; the trigger isn't likely to be there. An option is to silently ignore each inheritance child (not partition) if the trigger isn't present on it. We all seem to agree that REPLICA IDENTITY should recurse. Maybe ADD GENERATED AS IDENTITY / DROP IDENTITY should recurse; not really sure about this one. Peter? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Index Skip Scan
On 3/1/19 12:03 AM, Thomas Munro wrote: > On Fri, Mar 1, 2019 at 11:23 AM Jeff Janes wrote: >> On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI >> wrote: >>> At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com> >>> wrote in >>> A bit of adjustment after nodes/relation -> nodes/pathnodes. >>> >>> I had a look on this. >>> >>> The name "index skip scan" is a different feature from the >>> feature with the name on other prodcuts, which means "index scan >>> with postfix key (of mainly of multi column key) that scans >>> ignoring the prefixing part" As Thomas suggested I'd suggest that >>> we call it "index hop scan". (I can accept Hopscotch, either:p) >> >> >> I think that what we have proposed here is just an incomplete implementation >> of what other products call a skip scan, not a fundamentally different >> thing. They don't ignore the prefix part, they use that part in a way to >> cancel itself out to give the same answer, but faster. I think they would >> also use this skip method to get distinct values if that is what is >> requested. But they would go beyond that to also use it to do something >> similar to the plan we get with this: > > Hi Jeff, > > "Hop scan" was just a stupid joke that occurred to me when I saw that > DB2 had gone for "jump scan". I think "skip scan" is a perfectly good > name and it's pretty widely used by now (for example, by our friends > over at SQLite to blow us away at these kinds of queries). > +1 to "hop scan" regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Why don't we have a small reserved OID range for patch revisions?
Peter Geoghegan writes: > On Thu, Feb 28, 2019 at 7:59 AM Robert Haas wrote: >> OK. Well, I think that doing nothing is superior to this proposal, >> for reasons similar to what Peter Eisentraut has already articulated. >> And I think rather than blasting forward with your own preferred >> alternative in the face of disagreement, you should be willing to >> discuss other possible options. But if you're not willing to do that, >> I can't make you. > Peter seemed to not want to do this on the grounds that it isn't > necessary at all, whereas you think that it doesn't go far enough. If > there is a consensus against what Tom has said, it's a cacophonous one > that cannot really be said to be in favor of anything. The only thing that's really clear is that some senior committers don't want to be bothered because they don't think there's a problem here that justifies any additional expenditure of their time. Perhaps they are right, because I'd expected some comments from non-committer developers confirming that they see a problem, and the silence is deafening. I'm inclined to commit some form of Naylor's tool improvement anyway, because I have use for it; I remember times when I've renumbered OIDs manually in patches, and it wasn't much fun. But I can't force a process change if there's not consensus for it among the committers. regards, tom lane
Re: Index Skip Scan
On Fri, Mar 1, 2019 at 11:23 AM Jeff Janes wrote: > On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI > wrote: >> At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com> >> wrote in >> > A bit of adjustment after nodes/relation -> nodes/pathnodes. >> >> I had a look on this. >> >> The name "index skip scan" is a different feature from the >> feature with the name on other prodcuts, which means "index scan >> with postfix key (of mainly of multi column key) that scans >> ignoring the prefixing part" As Thomas suggested I'd suggest that >> we call it "index hop scan". (I can accept Hopscotch, either:p) > > > I think that what we have proposed here is just an incomplete implementation > of what other products call a skip scan, not a fundamentally different thing. > They don't ignore the prefix part, they use that part in a way to cancel > itself out to give the same answer, but faster. I think they would also use > this skip method to get distinct values if that is what is requested. But > they would go beyond that to also use it to do something similar to the plan > we get with this: Hi Jeff, "Hop scan" was just a stupid joke that occurred to me when I saw that DB2 had gone for "jump scan". I think "skip scan" is a perfectly good name and it's pretty widely used by now (for example, by our friends over at SQLite to blow us away at these kinds of queries). Yes, simple distinct value scans are indeed just the easiest kind of thing to do with this kind of scan-with-fast-forward. As discussed already in this thread and the earlier one there is a whole family of tricks you can do, and the thing that most people call an "index skip scan" is indeed the try-every-prefix case where you can scan an index on (a, b) given a WHERE clause b = x. Perhaps the simple distinct scan could appear as "Distinct Index Skip Scan"? And perhaps the try-every-prefix-scan could appear as just "Index Skip Scan"? Whether these are the same executor node is a good question; at one point I proposed a separate nest-loop like node for the try-every-prefix-scan, but Robert shot that down pretty fast. I now suspect (as he said) that all of this belongs in the index scan node, as different modes. The behaviour is overlapping; for "Distinct Index Skip Scan" you skip to each distinct prefix and emit one tuple, whereas for "Index Skip Scan" you skip to each distinct prefix and then perform a regular scan for the prefix + the suffix emitting matches. > (which currently takes 200ms, rather than the 0.9ms taken for the one > benefiting from skip scan) Nice. > I don't think we should give it a different name, just because our initial > implementation is incomplete. +1 > Or do you think our implementation of one feature does not really get us > closer to implementing the other? My question when lobbing the earlier sketch patches into the mailing list a few years back was: is this simple index AM interface and implementation (once debugged) powerful enough for various kinds of interesting skip-based plans? So far I have the impression that it does indeed work for Distinct Index Skip Scan (demonstrated), Index Skip Scan (no one has yet tried that), and special cases of extrema aggregate queries (foo, MIN(bar) can be performed by skip scan of index on (foo, bar)), but may not work for the semi-related merge join trickery mentioned in a paper posted some time back (though I don't recall exactly why). Another question is whether it should all be done by the index scan node, and I think the answer is yet. -- Thomas Munro https://enterprisedb.com
Re: Drop type "smgr"?
On Fri, Mar 1, 2019 at 11:31 AM Shawn Debnath wrote: > On Thu, Feb 28, 2019 at 02:08:49PM -0800, Andres Freund wrote: > > On 2019-03-01 09:48:33 +1300, Thomas Munro wrote: > > > On Fri, Mar 1, 2019 at 7:24 AM Andres Freund wrote: > > > > On 2019-02-28 13:16:02 -0500, Tom Lane wrote: > > > > > Shawn Debnath writes: > > > > > > Another thought: my colleague Anton Shyrabokau suggested potentially > > > > > > re-using forknumber to differentiate smgrs. We are using 32 bits to > > > > > > map 5 entries today. > > > > > > > > > > Yeah, that seems like it might be a workable answer. > > > > > > > > Yea, if we just split that into two 16bit entries, there'd not be much > > > > lost. Some mild mild performance regression due to more granular > > > > memory->register reads/writes, but I can't even remotely see that > > > > matter. > > > > > > Ok, that's a interesting way to include it in BufferTag without making > > > it wider. But then how about the SMGR interface? I think that value > > > would need to be added to the smgropen() interface, and all existing > > > callers would pass in (say) SGMR_RELATION (meaning "use md.c"), and > > > new code would use SMGR_UNDO, SMGR_SLRU etc. That seems OK to me. > > > > Right, seems like we should do that independent of whether we end up > > reusing the dboid or not. > > Food for thought: if we are going to muck with the smgr APIs, would it > make sense to move away from SMgrRelation to something a bit more > generic like, say, SMgrHandle to better organize the internal contents > of this structure? Internally, we could move things into an union and > based on type of handle: relation, undo, slru/generic, translate the > contents correctly. As you can guess, SMgrRelationData today is very > specific to relations and holding md specific information whose memory > would be better served re-used for the other storage managers. > > Thoughts? Right, it does contain some md-specific stuff without even apologising. Also smgropen() was rendered non-virtual at some point (I mean that implementations don't even get a chance to initialise anything, which works out only because md-specific code has leaked into smgr.c). In one of my undo patches (which I'll post an updated version of on the appropriate thread soon) I added a void * called private_data so that undo_file.c can keep track of some stuff, but yeah I agree that more tidying up could be done. -- Thomas Munro https://enterprisedb.com
Re: Why don't we have a small reserved OID range for patch revisions?
On Thu, Feb 28, 2019 at 7:59 AM Robert Haas wrote: > I don't think this is the worst proposal ever. However, I also think > that it's not unreasonable to raise the issue that writing OR > reviewing OR committing a patch already involves adhering to a thicket > of undocumented rules. When somebody fails to adhere to one of those > rules, they get ignored or publicly shamed. Now you want to add yet > another step to the process - really two. There does seem to be a real problem with undocumented processes. For example, I must confess that it came as news to me that we already had a reserved OID range. However, I don't think that there is that much of an issue with adding new mechanisms like this, provided it makes it easy to do the right thing and hard to do the wrong thing. What Tom has proposed so far is not something that self-evidently meets that standard, but it's also not something that self-evidently fails to meet that standard. I have attempted to institute some general guidelines for what the thicket of rules are by creating the "committing checklist" page. This is necessarily imperfect, because the rules are in many cases open to interpretation, often for good practical reasons. I don't have any sympathy for committers that find it hard to remember to do a catversion bump with any kind of regularity. That complexity seems inherent, not incidental, since it's often convenient to ignore catalog incompatibilities during development. > So, I suspect that for every > unit of work it saves somebody, it's probably going to generate about > one unit of extra work for somebody else. Maybe so. I think that you're jumping to conclusions, though. > A lot of projects have a much less painful process for getting patches > integrated than we do. I don't know how those projects maintain > adequate code quality, but I do know that making it easy to get a > patch accepted makes people more likely to contribute patches, and > increases overall development velocity. It is not even vaguely > unreasonable to worry about whether making this more complicated is > going to hurt more than it helps, and I don't know why you think > otherwise. But you seem to want to make the mechanism itself even more complicated, not less complicated (based on your remarks about making OID assignment happen during the build). In order to make the use of the mechanism easier. That seems worth considering, but ISTM that this is talking at cross purposes. There are far simpler ways of making it unlikely that a committer is going to miss this step. There is also a simple way of noticing that they do quickly (e.g. a simple buildfarm test). > Right now every entry in pg_proc.dat includes an OID assignment. What > I'm proposing is that we would also allow entries that did not have > one, and the build process would assign one while processing the .dat > files. Then later, somebody could use a script that went through and > rewrote the .dat file to add OID assignments to any entries that > lacked them. Since the topic of having tools for automated rewrite of > those files has been discussed at length, and since we already have a > script called reformat_dat_file.pl in the tree which contains > comments indicating that it could be modified for bulk editing, said > script having been committed BY YOU, I don't understand why you think > that bulk editing is infeasible. I'm also curious to hear what Tom thinks about this. > OK. Well, I think that doing nothing is superior to this proposal, > for reasons similar to what Peter Eisentraut has already articulated. > And I think rather than blasting forward with your own preferred > alternative in the face of disagreement, you should be willing to > discuss other possible options. But if you're not willing to do that, > I can't make you. Peter seemed to not want to do this on the grounds that it isn't necessary at all, whereas you think that it doesn't go far enough. If there is a consensus against what Tom has said, it's a cacophonous one that cannot really be said to be in favor of anything. -- Peter Geoghegan
Re: Drop type "smgr"?
On Fri, Mar 1, 2019 at 10:41 AM Shawn Debnath wrote: > On Fri, Mar 01, 2019 at 10:33:06AM +1300, Thomas Munro wrote: > > It doesn't make any sense to put things like clog or any other SLRU in > > a non-default tablespace though. It's perfectly OK if not all smgr > > implementations know how to deal with tablespaces, and the SLRU > > support should just not support that. > > If the generic storage manager, or whatever we end up calling it, ends > up being generic enough - its possible that tablespace value would have > to be respected. Right, you and I have discussed this a bit off-list, but for the benefit of others, I think what you're getting at with "generic storage manager" here is something like this: on the one hand, our proposed revival of SMGR as a configuration point is about is supporting alternative file layouts for bufmgr data, but at the same time there is some background noise about direct IO, block encryption, ... and who knows what alternative block storage someone might come up with ... at the block level. So although it sounds a bit contradictory to be saying "let's make all these different SMGRs!" at the same time as saying "but we'll eventually need a single generic SMGR that is smart enough to be parameterised for all of these layouts!", I see why you say it. In fact, the prime motivation for putting SLRUs into shared buffers is to get better buffering, because (anecdotally) slru.c's mini-buffer scheme performs abysmally without the benefit of an OS page cache. If we add optional direct IO support (something I really want), we need it to apply to SLRUs, undo and relations, ideally without duplicating code, so we'd probably want to chop things up differently. At some point I think we'll need to separate the questions "how to map blocks to filenames and offsets" and "how to actually perform IO". I think the first question would be controlled by the SMGR IDs as discussed, but the second question probably needs to be controlled by GUCs that control all IO, and/or special per relation settings (supposing you can encrypt just one table, as a random example I know nothing about); but that seems way out of scope for the present projects. IMHO the best path from here is to leave md.c totally untouched for now as the SMGR for plain old relations, while we work on getting these new kinds of bufmgr data into the tree as a first step, and a later hypothetical direct IO or whatever project can pay for the refactoring to separate IO from layout. -- Thomas Munro https://enterprisedb.com
Re: propagating replica identity to partitions
On 2019-Feb-20, Robert Haas wrote: > On Wed, Feb 20, 2019 at 12:38 PM Alvaro Herrera > wrote: > > Maybe the ALL IN TABLESPACE and OWNED BY sub-forms should be split to a > > separate para. I suggest: > > > > : This form changes the table's tablespace to the specified tablespace > > : and moves the data file(s) associated with the table to the new > > : tablespace. Indexes on the table, if any, are not moved; but they > > : can be moved separately with additional SET TABLESPACE commands. > > : When applied to a partitioned table, nothing is moved, but any > > : partitions created afterwards with CREATE TABLE PARTITION OF > > : will use that tablespace. > > : > > : All > > : tables in the current database in a tablespace can be moved by using > > : the ALL IN TABLESPACE form, which will lock all tables to be moved > > : first and then move each one. This form also supports OWNED BY, > > : which will only move tables owned by the roles specified. If the > > : NOWAIT option is specified then the command will fail if it is > > : unable to acquire all of the locks required immediately. Note that > > : system catalogs are not moved by this command, use ALTER DATABASE or > > : explicit ALTER TABLE invocations instead if desired. The > > : information_schema relations are not considered part of the system > > : catalogs and will be moved. See also CREATE TABLESPACE. > > Seems reasonable. Pushed this bit, thanks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Index Skip Scan
On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI < horiguchi.kyot...@lab.ntt.co.jp> wrote: > Hello. > > At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com> > wrote in aa+fz3guncutf52q1sufb7ise37tjpsd...@mail.gmail.com> > > A bit of adjustment after nodes/relation -> nodes/pathnodes. > > I had a look on this. > > The name "index skip scan" is a different feature from the > feature with the name on other prodcuts, which means "index scan > with postfix key (of mainly of multi column key) that scans > ignoring the prefixing part" As Thomas suggested I'd suggest that > we call it "index hop scan". (I can accept Hopscotch, either:p) > I think that what we have proposed here is just an incomplete implementation of what other products call a skip scan, not a fundamentally different thing. They don't ignore the prefix part, they use that part in a way to cancel itself out to give the same answer, but faster. I think they would also use this skip method to get distinct values if that is what is requested. But they would go beyond that to also use it to do something similar to the plan we get with this: Set up: pgbench -i -s50 create index on pgbench_accounts (bid, aid); alter table pgbench_accounts drop constraint pgbench_accounts_pkey ; Query: explain analyze with t as (select distinct bid from pgbench_accounts ) select pgbench_accounts.* from pgbench_accounts join t using (bid) where aid=5; If we accept this patch, I hope it would be expanded in the future to give similar performance as the above query does even when the query is written in its more natural way of: explain analyze select * from pgbench_accounts where aid=5; (which currently takes 200ms, rather than the 0.9ms taken for the one benefiting from skip scan) I don't think we should give it a different name, just because our initial implementation is incomplete. Or do you think our implementation of one feature does not really get us closer to implementing the other? Cheers, Jeff
Re: Drop type "smgr"?
Andres Freund writes: > FWIW, I think while distasteful, I could see us actually using oids, > just ones that are small enough to fit into 16bit... If we suppose that all smgrs must be built-in, that's not even much of a restriction... regards, tom lane
Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
Alvaro Herrera writes: > On 2019-Feb-28, Tom Lane wrote: >> I wasn't really working on that for v12 --- I figured it was way >> too late in the cycle to be starting on such a significant change. > Oh, well, it certainly seems far too late *now*. However, what about > the idea in > https://postgr.es/m/1255.1544562...@sss.pgh.pa.us > namely that we write out the buffers involved? That sounds like it > might be backpatchable, and thus it's not too late for it. I think that what we had in mind at that point was that allowing forced writes of empty-but-dirty pages would provide a back-patchable solution to the problem of ftruncate() failure leaving corrupt state on-disk. That would not, by itself, remove the need for AccessExclusiveLock, so it doesn't seem like it would eliminate people's desire for the kind of knob being discussed here. Thinking about it, the need for AEL is mostly independent of the data corruption problem; rather, it's a hack to avoid needing to think about concurrent-truncation scenarios in table readers. We could fairly easily reduce the lock level to something less than AEL if we just taught seqscans, indexscans, etc that trying to read a page beyond EOF is not an error. (Reducing the lock level to the point where we could allow concurrent *writers* is a much harder problem, I think. But to ameliorate the issues for standbys, we just need to allow concurrent readers.) And we'd have to do something about readers possibly loading doomed pages back into shmem before the truncation happens; maybe that can be fixed just by truncating first and flushing buffers second? I think the $64 question is whether we're giving up any meaningful degree of error detection if we allow read-beyond-EOF to not be an error. If we conclude that we're not, maybe it wouldn't be a very big patch? regards, tom lane
Re: Drop type "smgr"?
On 2019-03-01 09:48:33 +1300, Thomas Munro wrote: > On Fri, Mar 1, 2019 at 7:24 AM Andres Freund wrote: > > On 2019-02-28 13:16:02 -0500, Tom Lane wrote: > > > Shawn Debnath writes: > > > > Another thought: my colleague Anton Shyrabokau suggested potentially > > > > re-using forknumber to differentiate smgrs. We are using 32 bits to > > > > map 5 entries today. > > > > > > Yeah, that seems like it might be a workable answer. > > > > Yea, if we just split that into two 16bit entries, there'd not be much > > lost. Some mild mild performance regression due to more granular > > memory->register reads/writes, but I can't even remotely see that > > matter. > > Ok, that's a interesting way to include it in BufferTag without making > it wider. But then how about the SMGR interface? I think that value > would need to be added to the smgropen() interface, and all existing > callers would pass in (say) SGMR_RELATION (meaning "use md.c"), and > new code would use SMGR_UNDO, SMGR_SLRU etc. That seems OK to me. Right, seems like we should do that independent of whether we end up reusing the dboid or not. > > > Since, per this discussion, the smgr IDs need not really be OIDs at > > > all, we just need a few bits for them. > > > > Personally I find having them as oids more elegant than the distaste > > from misusing the database oid for a wihle, but I think it's fair to > > disagree here. > > It sounds like your buffer mapping redesign would completely change > the economics and we could reconsider much of this later without too > much drama; that was one of the things that made me feel that the > magic database OID approach was acceptable at least in the short term. Right. FWIW, I think while distasteful, I could see us actually using oids, just ones that are small enough to fit into 16bit... Greetings, Andres Freund
Re: POC: converting Lists into arrays
Here's a rebased version of the main patch. David Rowley writes: > The only thing that I did to manage to speed the patch up was to ditch > the additional NULL test in lnext(). I don't see why that's required > since lnext(NULL) would have crashed with the old implementation. I adopted this idea. I think at one point where I was fooling with different implementations for foreach(), it was necessary that lnext() be cool with a NULL input; but as things stand now, it's not. I haven't done anything else in the performance direction, but am planning to play with that next. I did run through all the list_delete_foo callers and fix the ones that were still busted. I also changed things so that with DEBUG_LIST_MEMORY_USAGE enabled, list deletions would move the data arrays around, in hopes of catching more stale-pointer problems. Depressingly, check-world still passed with that added, even before I'd fixed the bugs I found by inspection. This does not speak well for the coverage of our regression tests. regards, tom lane reimplement-List-as-array-2.patch.gz Description: reimplement-List-as-array-2.patch.gz
Re: Index Skip Scan
On Wed, Feb 20, 2019 at 11:33 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > On Fri, Feb 1, 2019 at 8:24 PM Jesper Pedersen < > jesper.peder...@redhat.com> wrote: > > > > Dmitry and I will look at this and take it into account for the next > > version. > > In the meantime, just to not forget, I'm going to post another version > with a > fix for cursor fetch backwards, which was crashing before. This version of the patch can return the wrong answer. create index on pgbench_accounts (bid, aid); begin; declare c cursor for select distinct on (bid) bid, aid from pgbench_accounts order by bid, aid; fetch 2 from c; bid | aid -+- 1 | 1 2 | 100,001 fetch backward 1 from c; bid | aid -+- 1 | 100,000 Without the patch, instead of getting a wrong answer, I get an error: ERROR: cursor can only scan forward HINT: Declare it with SCROLL option to enable backward scan. If I add "SCROLL", then I do get the right answer with the patch. Cheers, Jeff
Re: [HACKERS] [PROPOSAL] Temporal query processing with range types
Dear all, we rebased our temporal normalization patch on top of 554ebf687852d045f0418d3242b978b49f160f44 from 2019-02-28. On 9/7/18 1:02 PM, Peter Moser wrote: > The syntax is > SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c; Please find all information about our decisions and current state within the previous email. > What we like to discuss now is: > - Is sort_inner_and_outer the correct place to perform this split? > - How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE > mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is > not an equality operator, but if we use RANGE_EQ it assumes that the > right-hand-side of the operator must be a range as well. > - Should we better change our mergeclause to a RANGE_ELEM_CONTAINED > comparison, or keep RANGE_EQ and fix pathkeys later? > - How do we update equivalence classes, and pathkeys > when changing the inner relation's data type from "int4range" to "int" > in the query tree inside "sort_inner_and_outer" to get the correct > ordering and data types I will also add this prototype (WIP) patch to the commitfest of March, as suggested by two developers met at the FOSDEM some weeks ago. Best regards, Anton, Johann, Michael, Peter diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c index 2a1d000b03..a309596fa1 100644 --- a/src/backend/executor/nodeMergejoin.c +++ b/src/backend/executor/nodeMergejoin.c @@ -99,6 +99,106 @@ #include "utils/lsyscache.h" #include "utils/memutils.h" +// XXX TEMPORAL NORMALIZE PEMOSER +// !!! THis is just for prototyping, delete asap... + +#include "catalog/pg_operator.h" +#include "nodes/nodeFuncs.h" +#include "utils/fmgroids.h" +#include "utils/rangetypes.h" +#include "utils/typcache.h" +#include "access/htup_details.h"/* for heap_getattr */ +#include "nodes/print.h"/* for print_slot */ +#include "utils/datum.h"/* for datumCopy */ + + + +#define TEMPORAL_DEBUG +/* + * #define TEMPORAL_DEBUG + * XXX PEMOSER Maybe we should use execdebug.h stuff here? + */ +#ifdef TEMPORAL_DEBUG +static char* +datumToString(Oid typeinfo, Datum attr) +{ + Oid typoutput; + booltypisvarlena; + getTypeOutputInfo(typeinfo, , ); + return OidOutputFunctionCall(typoutput, attr); +} + +#define TPGdebug(...) { printf(__VA_ARGS__); printf("\n"); fflush(stdout); } +#define TPGdebugDatum(attr, typeinfo) TPGdebug("%s = %s %ld\n", #attr, datumToString(typeinfo, attr), attr) +#define TPGdebugSlot(slot) { printf("Printing Slot '%s'\n", #slot); print_slot(slot); fflush(stdout); } + +#else +#define datumToString(typeinfo, attr) +#define TPGdebug(...) +#define TPGdebugDatum(attr, typeinfo) +#define TPGdebugSlot(slot) +#endif + +TypeCacheEntry *testmytypcache; +#define setSweepline(datum) \ + node->sweepline = datumCopy(datum, node->datumFormat->attbyval, node->datumFormat->attlen) + +#define freeSweepline() \ + if (! node->datumFormat->attbyval) pfree(DatumGetPointer(node->sweepline)) + + /* + * slotGetAttrNotNull + * Same as slot_getattr, but throws an error if NULL is returned. + */ +static Datum +slotGetAttrNotNull(TupleTableSlot *slot, int attnum) +{ + bool isNull; + Datum result; + + result = slot_getattr(slot, attnum, ); + + if(isNull) + ereport(ERROR, +(errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("Attribute \"%s\" at position %d is null. Temporal " \ + "adjustment not possible.", + NameStr(TupleDescAttr(slot->tts_tupleDescriptor, attnum - 1)->attname), + attnum))); + + return result; +} + +/* + * heapGetAttrNotNull + * Same as heap_getattr, but throws an error if NULL is returned. + */ +static Datum +heapGetAttrNotNull(TupleTableSlot *slot, int attnum) +{ + bool isNull; + Datum result; + HeapTuple tuple; + + tuple = ExecFetchSlotHeapTuple(slot, true, NULL); + result = heap_getattr(tuple, + attnum, + slot->tts_tupleDescriptor, + ); + if(isNull) + ereport(ERROR, +(errcode(ERRCODE_NOT_NULL_VIOLATION), + errmsg("Attribute \"%s\" at position %d is null. Temporal " \ + "adjustment not possible.", + NameStr(TupleDescAttr(slot->tts_tupleDescriptor, +attnum - 1)->attname), + attnum))); + + return result; +} + +// XXX TEMPORAL NORMALIZE PEMOSER END + /* * States of the ExecMergeJoin state machine @@ -138,6 +238,10 @@ typedef struct MergeJoinClauseData * stored here. */ SortSupportData ssup; + + /* needed for Temporal Normalization */ + bool isnormalize; + TypeCacheEntry *range_typcache; } MergeJoinClauseData; /* Result type for MJEvalOuterValues and MJEvalInnerValues */ @@ -152,6 +256,59 @@ typedef enum #define MarkInnerTuple(innerTupleSlot, mergestate) \ ExecCopySlot((mergestate)->mj_MarkedTupleSlot, (innerTupleSlot)) +/* + * temporalAdjustmentStoreTuple + * While we
Re: Drop type "smgr"?
On Fri, Mar 1, 2019 at 4:09 AM Tom Lane wrote: > Thomas Munro writes: > > On Thu, Feb 28, 2019 at 7:37 PM Tom Lane wrote: > >> Thomas Munro writes: > >>> Our current thinking is that smgropen() should know how to map a small > >>> number of special database OIDs to different smgr implementations > > >> Hmm. Maybe mapping based on tablespaces would be a better idea? > > > In the undo log proposal (about which more soon) we are using > > tablespaces for their real purpose, so we need that OID. If you SET > > undo_tablespaces = foo then future undo data created by your session > > will be written there, which might be useful for putting that IO on > > different storage. > > Meh. That's a point, but it doesn't exactly seem like a killer argument. > Just in the abstract, it seems much more likely to me that people would > want per-database special rels than per-tablespace special rels. And > I think your notion of a GUC that can control this is probably pie in > the sky anyway: if we can't afford to look into the catalogs to resolve > names at this code level, how are we going to handle a GUC? I have this working like so: * undo logs have a small amount of meta-data in shared memory, stored in a file at checkpoint time, with all changes WAL logged, visible to users in pg_stat_undo_logs view * one of the properties of an undo log is its tablespace (the point here being that it's not in a catalog) * you don't need access to any catalogs to find the backing files for a RelFileNode (the path via tablespace symlinks is derivable from spcNode) * therefore you can find your way from an UndoLogRecPtr in (say) a zheap page to the relevant blocks on disk without any catalog access; this should work even in the apparently (but not actually) circular case of a pg_tablespace catalog that is stored in zheap (not something we can do right now, but hypothetically speaking), and has undo data that is stored in some non-default tablespace that must be consulted while scanning the catalog (not that I'm suggesting that would necessarily be a good idea to suppose catalogs in non-default tablespaces; I'm just addressing your theoretical point) * the GUC is used to resolve tablespace names to OIDs only by sessions that are writing, when selecting (or creating) an undo log to attach to and begin writing into; those sessions have no trouble reading the catalog to do so without problematic circularities, as above Seems to work; the main complications so far were coming up with reasonable behaviour and interlocking when you drop tablespaces that contain undo logs (short version: if they're not needed for snapshots or rollback, they are dropped, wasting the rest of their undo address space; otherwise they prevents the tablespace from being dropped with a clear message to that effect). It doesn't make any sense to put things like clog or any other SLRU in a non-default tablespace though. It's perfectly OK if not all smgr implementations know how to deal with tablespaces, and the SLRU support should just not support that. > The real reason I'm concerned about this, though, is that for either > a database or a tablespace, you can *not* get away with having a magic > OID just hanging in space with no actual catalog row matching it. > If nothing else, you need an entry there to prevent someone from > reusing the OID for another purpose. And a pg_database row that > doesn't correspond to a real database is going to break all kinds of > code, starting with pg_upgrade and the autovacuum launcher. Special > rows in pg_tablespace are much less likely to cause issues, because > of the precedent of pg_global and pg_default. GetNewObjectId() never returns values < FirstNormalObjectId. I don't think it's impossible for someone to want to put SMGRs in a catalog of some kind some day. Even though the ones for clog, undo etc would still probably need special hard-coded treatment as discussed, I suppose it's remotely possible that someone might some day figure out a useful way to allow extensions that provide different block storage (nvram? zfs zvols? encryption? (see Haribabu's reply)) but I don't have any specific ideas about that or feel inclined to design something for unknown future use. -- Thomas Munro https://enterprisedb.com
Re: get_controlfile() can leak fds in the backend
On 2/28/19 7:20 AM, Michael Paquier wrote: > On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote: >> Sure, will do. What are your thoughts on backpatching? This seems >> unlikely to be a practical concern in the field, so my inclination is a >> master only fix. > > I agree that this would unlikely become an issue as an error on the > control file would most likely be a PANIC when updating it, so fixing > only HEAD sounds fine to me. Thanks for asking! Committed and push that way. By the way, while looking at this, I noted at least a couple of places where OpenTransientFile() is being passed O_RDWR when the usage is pretty clearly intended to be read-only. For example at least two instances in slru.c -- SlruPhysicalReadPage() and SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and fixing those instances? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: get_controlfile() can leak fds in the backend
Hi, On 2019-02-28 09:54:48 +0100, Fabien COELHO wrote: > > If we were to want to do more here, ISTM the right approach would use > > the postmaster pid file, not the control file. > > ISTM that this just means re-inventing a manual poor-featured > race-condition-prone lock API around another file, which seems to be created > more or less only by "pg_ctl", while some other commands use the control > file (eg pg_rewind, AFAICS). Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to see if the backend has finished starting up etc. It's precisely what the backend uses to prevent two postmasters to start etc. It's also what say pg_resetwal checks to protect against a concurrently running lcuster (albeit in a racy way). If we want to make things more bulletproof, that's the place. The control file is constantly written to, sometimes by different processes, it'd just not be a good file for such lockout mechanisms. Greetings, Andres Freund
Re: Drop type "smgr"?
On Fri, Mar 1, 2019 at 7:24 AM Andres Freund wrote: > On 2019-02-28 13:16:02 -0500, Tom Lane wrote: > > Shawn Debnath writes: > > > Another thought: my colleague Anton Shyrabokau suggested potentially > > > re-using forknumber to differentiate smgrs. We are using 32 bits to > > > map 5 entries today. > > > > Yeah, that seems like it might be a workable answer. > > Yea, if we just split that into two 16bit entries, there'd not be much > lost. Some mild mild performance regression due to more granular > memory->register reads/writes, but I can't even remotely see that > matter. Ok, that's a interesting way to include it in BufferTag without making it wider. But then how about the SMGR interface? I think that value would need to be added to the smgropen() interface, and all existing callers would pass in (say) SGMR_RELATION (meaning "use md.c"), and new code would use SMGR_UNDO, SMGR_SLRU etc. That seems OK to me. Ancient POSTGRES had an extra argument like that and would say eg smgropen(rd->rd_rel->relsmgr, rd), but in this new idea I think it'd always be a constant or a value from a BufferTag, and the BufferTag would have been set with a constant, since the code reading these buffers would always be code that knows which it wants. We'd be using this new argument not as a modifier to control storage location as they did back then, but rather as a whole new namespace for RelFileNode values, that also happens to be stored differently; that might be a hint that it could also go into RelFileNode (but I'm not suggesting that). > > Since, per this discussion, the smgr IDs need not really be OIDs at > > all, we just need a few bits for them. > > Personally I find having them as oids more elegant than the distaste > from misusing the database oid for a wihle, but I think it's fair to > disagree here. It sounds like your buffer mapping redesign would completely change the economics and we could reconsider much of this later without too much drama; that was one of the things that made me feel that the magic database OID approach was acceptable at least in the short term. -- Thomas Munro https://enterprisedb.com
Re: Segfault when restoring -Fd dump on current HEAD
> On Thu, Feb 28, 2019 at 9:24 PM Alvaro Herrera > wrote: > > Pushed, thanks. I added the reminder comment I mentioned. Thank you, sorry for troubles.
Re: ATTACH/DETACH PARTITION CONCURRENTLY
On Tue, Feb 26, 2019 at 5:10 PM Robert Haas wrote: > Aside from these problems, I think I have spotted a subtle problem in > 0001. I'll think about that some more and post another update. 0001 turned out to be guarding against the wrong problem. It supposed that if we didn't get a coherent view of the system catalogs due to concurrent DDL, we could just AcceptInvalidationMessages() and retry. But that turns out to be wrong, because there's a (very) narrow window after a process removes itself from the ProcArray and before it sends invalidation messages. It wasn't difficult to engineer an alternative solution that works, but unfortunately it's only good enough to handle the ATTACH case, so this is another thing that will need more thought for concurrent DETACH. Anyway, the updated 0001 contains that code and some explanatory comments. The rest of the series is substantially unchanged. I'm not currently aware of any remaining correctness issues with this code, although certainly there may be some. There has been a certain dearth of volunteers to review any of this. I do plan to poke at it a bit to see whether it has any significant performance impact, but not today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company v4-0004-Reduce-the-lock-level-required-to-attach-a-partit.patch Description: Binary data v4-0002-Ensure-that-repeated-PartitionDesc-lookups-return.patch Description: Binary data v4-0001-Teach-RelationBuildPartitionDesc-to-cope-with-con.patch Description: Binary data v4-0003-Teach-runtime-partition-pruning-to-cope-with-conc.patch Description: Binary data
Re: Segfault when restoring -Fd dump on current HEAD
On 2019-Feb-27, Dmitry Dolgov wrote: > > On Wed, Feb 27, 2019 at 1:32 PM Alvaro Herrera > > wrote: > > > > > > I think it would be better to just put back the .defn = "" (etc) to the > > > > ArchiveEntry calls. > > > > > > Then we should do this not only for defn, but for owner and dropStmt too. > > > > Yeah, absolutely. > > Done, please find the attached patch. Pushed, thanks. I added the reminder comment I mentioned. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: reloption to prevent VACUUM from truncating empty pages at the end of relation
On 2019-Feb-28, Tom Lane wrote: > Alvaro Herrera writes: > > Hopefully we'll get Tom's patch that addresses the failure-to-truncate > > issues in pg12. > > Hm, are you speaking of the handwaving I did in > https://www.postgresql.org/message-id/2348.1544474...@sss.pgh.pa.us > ? > > I wasn't really working on that for v12 --- I figured it was way > too late in the cycle to be starting on such a significant change. Oh, well, it certainly seems far too late *now*. However, what about the idea in https://postgr.es/m/1255.1544562...@sss.pgh.pa.us namely that we write out the buffers involved? That sounds like it might be backpatchable, and thus it's not too late for it. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
On Wed, Feb 27, 2019 at 3:16 AM Ideriha, Takeshi wrote: > I'm afraid I may be quibbling about it. > What about users who understand performance drops but don't want to > add memory or decrease concurrency? > I think that PostgreSQL has a parameter > which most of users don't mind and use is as default > but a few of users want to change it. > In this case as you said, introducing hard limit parameter causes > performance decrease significantly so how about adding detailed caution > to the document like planner cost parameter? There's nothing wrong with a parameter that is useful to some people and harmless to everyone else, but the people who are proposing that parameter still have to demonstrate that it has those properties. This email thread is really short on clear demonstrations that X or Y is useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: FETCH FIRST clause PERCENT option
On 2/28/19 12:26 PM, Kyotaro HORIGUCHI wrote: > Hello. > > At Sat, 23 Feb 2019 22:27:44 +0100, Tomas Vondra > wrote in > <81a5c0e9-c17d-28f3-4647-8a4659cdf...@2ndquadrant.com> >> >> >> On 2/23/19 8:53 AM, Surafel Temesgen wrote: >>> >>> >>> On Sun, Feb 10, 2019 at 2:22 AM Tomas Vondra >>> mailto:tomas.von...@2ndquadrant.com>> wrote: >>> >>> >>> >>> I'm not sure I understand - are you saying every time the user does a >>> FETCH, we have to run the outer plan from scratch? I don't see why would >>> that be necessary? And if it is, how come there's no noticeable >>> performance difference? >>> >>> Can you share a patch implementing the incremental approach, and a query >>> demonstrating the issue? >>> >>> >>> I didn't implement it but its obvious that it doesn't work similarly >>> with previous approach. >>> >> >> Sure, but that's hardly a sufficient argument for the current approach. >> >>> We need different implementation and my plan was to use tuplestore per >>> call and clear >>> >>> it after returning tuple but I see that the plan will not go far because >>> mainly the last returned >>> >>> slot is not the last slot we get from outerPlan execution >>> >> >> I'm sorry, I still don't understand what the supposed problem is. I >> don't think it's all that different from what nodeMaterial.c does, for >> example. >> >> As I explained before, having to execute the outer plan till completion >> before returning any tuples is an issue. So either it needs fixing or an >> explanation why it's not an issue. > > One biggest issue seems to be we don't know the total number of > outer tuples before actually reading a null tuple. I doubt of > general shortcut for that. It also seems preventing limit node > from just using materialized outer. > Sure, if you actually want all tuples, you'll have to execute the outer plan till completion. But that's not what I'm talking about - what if we only ever need to read one row from the limit? To give you a (admittedly, somewhat contrived and artificial example): SELECT * FROM t1 WHERE id IN ( SELECT id FROM t2 ORDER BY x FETCH FIRST 10 PERCENT ROWS ONLY ); Maybe this example is bogus and/or does not really matter in practice. I don't know, but I've been unable to convince myself that's the case. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Hi, Attached is an updated version of this patch series. I've decided to rebase and send both parts (MCV and histograms), although we've agreed to focus on the MCV part for now. I don't want to leave the histogram to lag behind, because (a) then it'd be much more work to update it, and (b) I think it's an useful feedback about likely future changes. This should address most of the issues pointed out by David in his recent reviews. Briefly: 1) It fixes/updates a number of comments and docs on various places, removes redundant comments etc. In most cases I've simply adopted the wording proposed by David, with minor tweaks in a couple of cases. 2) Reverts changes that exposed analyze_mcv_list - this was a leftover from the attempt to reuse the single-column algorithm, but we've since agreed it's not the right approach. So this change is unnecessary. 3) I've tweaked the code to accept RelabelType nodes as supported, similarly to what examine_variable() does. Previously I concluded we can't support RelabelType, but it seems that reasoning was bogus. I've slightly tweaked the regression tests by changing one of the columns to varchar, so that the queries actualy trigger this. 4) I've tweaked a couple of places (UpdateStatisticsForTypeChange, statext_clauselist_selectivity and estimate_num_groups_simple) per David's suggestions. Those were fairly straightforward simplifications. 5) I've removed mcv_count from statext_mcv_build(). As David pointed out, this was not actually needed - it was another remnant of the attempt to re-use analyze_mcv_list() which needs such array. But without it we can access the groups directly. 6) One of the review questions was about the purpose of this code: for (i = 0; i < nitems; i++) { if (groups[i].count < mincount) { nitems = i; break; } } It's quite simple - we want to include groups with more occurrences than mincount, and the groups are sorted by the count (in descending order). So we simply find the first group with count below mincount, and the index is the number of groups to keep. I've tried to explain that in a comment. 7) I've fixed a bunch of format patters in statext_mcv_deserialize(), particularly those that confused %d and %u. We can't however use %d for VARSIZE_ANY_EXHDR, because that macro expands into offsetof() etc. So that would trigger compiler warnings. 8) Yeah, pg_stats_ext_mcvlist_items was broken. The issue was that one of the output parameters is defined as boolean[], but the function was building just string. Originally it used BuildTupleFromCStrings(), but then it switched to heap_form_tuple() without building a valid array. I've decided to simply revert back to BuildTupleFromCStrings(). It's not going to be used very frequently, so the small performance difference is not important. I've also fixed the formatting issues, pointed out by David. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-multivariate-MCV-lists-20190228.patch.gz Description: application/gzip 0002-multivariate-histograms-20190228.patch.gz Description: application/gzip
Re: some ri_triggers.c cleanup
On 2019-02-25 17:17, Corey Huinker wrote: > Right, this makes a lot of sense, similar to how ri_restrict() combines > RESTRICT and NO ACTION. > > > I'm pretty sure that's where I got the idea, yes. Committed, including your patch. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_partition_tree crashes for a non-defined relation
On 2019-Feb-28, Michael Paquier wrote: > On Wed, Feb 27, 2019 at 03:48:08PM -0300, Alvaro Herrera wrote: > > I just happened to come across the result of this rationale in > > pg_partition_tree() (an SRF) while developing a new related function, > > pg_partition_ancestors(), and find the resulting behavior rather absurd > > -- it returns one row with all NULL columns, rather than no rows. I > > think the sensible behavior would be to do SRF_RETURN_DONE() before > > stashing any rows to the output, so that we get an empty result set > > instead. > > Hmm. Going through the thread again NULL was decided to make the > whole experience consistent, now by returning nothing we would get > a behavior as consistent as when NULL is used in input, so point taken > to tune the behavior for unsupported relkinds and undefined objects. Right, thanks. > Does the attached look fine to you? Yeah, looks good, please push. What about legacy inheritance? I see that pg_partition_tree handles that case perfectly well -- it seems to return the complete hierarchy rooted at the given relation. However, it seems odd that it works at all, don't you think? Consider this: create table t1 (a int); create table t11 () inherits (t1); create table t2 (b int); create table t111() inherits (t1, t2); alvherre=# select * from pg_partition_tree('t1'); relid | parentrelid | isleaf | level ---+-++--- t1| t | t | 0 t11 | t1 | t | 1 t111 | t1 | t | 1 (3 filas) OK so far... but look at t2's tree: alvherre=# select * from pg_partition_tree('t2'); relid | parentrelid | isleaf | level ---+-++--- t2| t | t | 0 t111 | t1 | t | 2 I think this one is just weird -- t1 is not listed as "relid" anywhere, and why does t111 has level=2? I would opt for returning the empty set for legacy inheritance too. More generally, I think we should return empty for anything that's either not RELKIND_PARTITIONED_TABLE or has relispartition set. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: POC: converting Lists into arrays
David Rowley writes: > On Thu, 28 Feb 2019 at 09:26, Tom Lane wrote: >> 0001 below does this. I found a couple of places that could use >> forfive(), as well. I think this is a clear legibility and >> error-proofing win, and we should just push it. > I've looked over this and I agree that it's a good idea. Reducing the > number of lnext() usages seems like a good idea in order to reduce the > footprint of the main patch. I've pushed that; thanks for reviewing! >> 0002 below does this. I'm having a hard time deciding whether this >> part is a good idea or just code churn. It might be more readable >> (especially to newbies) but I can't evaluate that very objectively. > I'm less decided on this. Yeah, I think I'm just going to drop that idea. People didn't seem very sold on list_cell_is_last() being a readability improvement, and it certainly does nothing to reduce the footprint of the main patch. I now need to rebase the main patch over what I pushed; off to do that next. regards, tom lane
Re: Question about pg_upgrade from 9.2 to X.X
here is the data, postgres=# \c template1 You are now connected to database "template1" as user "postgres". template1=# \dx List of installed extensions Name | Version | Schema | Description -+-++-- plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (1 row) template1=# \c postgres You are now connected to database "postgres" as user "postgres". postgres=# \dx List of installed extensions Name | Version | Schema | Description -+-++-- plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (1 row) postgres=# \c nagdb You are now connected to database "nagdb" as user "postgres". nagdb=# \dx List of installed extensions Name | Version | Schema | Description -+-++-- plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (1 row) nagdb=# \c archive_old You are now connected to database "books_old" as user "postgres". books_old=# \dx List of installed extensions Name| Version | Schema | Description +-++--- pg_stat_statements | 1.1 | public | track execution statistics of all SQL statements executed plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language (2 rows) archive_old=# \c production You are now connected to database "blurb_production" as user "postgres". production=# \dx List of installed extensions Name| Version | Schema | Description +-++--- hstore | 1.1 | public | data type for storing sets of (key, value) pairs pg_stat_statements | 1.1 | public | track execution statistics of all SQL statements executed plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language uuid-ossp | 1.0 | public | generate universally unique identifiers (UUIDs) (4 rows) Thanks, On Thu, Feb 28, 2019 at 11:04 AM Sergei Kornilov wrote: > Hi > > > Yes, i want to get rid of old extension, Could you please share the > query to find extension which is using pg_reorg. > > pg_reorg is name for both tool and extension. > Check every database in cluster with, for example, psql command "\dx" or > read pg_dumpall -s output for some CREATE EXTENSION statements to find all > installed extensions. > > regards, Sergei >
Re: Question about pg_upgrade from 9.2 to X.X
Hi > Yes, i want to get rid of old extension, Could you please share the query to > find extension which is using pg_reorg. pg_reorg is name for both tool and extension. Check every database in cluster with, for example, psql command "\dx" or read pg_dumpall -s output for some CREATE EXTENSION statements to find all installed extensions. regards, Sergei
Re: [RFC] [PATCH] Flexible "partition pruning" hook
On Wed, Feb 27, 2019 at 12:36 PM Peter Eisentraut wrote: > > To rephrase this: You have a partitioned table, and you have a RLS > policy that hides certain rows, and you know based on your business > logic that under certain circumstances entire partitions will be hidden, > so they don't need to be scanned. So you want a planner hook that would > allow you to prune those partitions manually. > > That sounds pretty hackish to me. We should give the planner and > executor the appropriate information to make these decisions, like an > additional partition constraint. Are you thinking of a partition pruning step for FuncExpr's or something else? I was considering an implementation where FuncExpr's were marked for execution-time pruning, but wanted to see if this patch got any traction first. > If this information is hidden in > user-defined functions in a way that cannot be reasoned about, who is > enforcing these constraints and how do we know they are actually correct? The author of the extension which utilizes the hook would have to be sure they use the hook correctly. This is not a new or different concept to any other existing hook. This hook in particular would be used by security extensions that have some understanding of the underlying security model being implemented by RLS. Thanks, -- Mike Palmiotto Software Engineer Crunchy Data Solutions https://crunchydata.com
Re: Bloom index cost model seems to be wrong
Jeff Janes writes: > Should we be trying to estimate the false positive rate and charging > cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to > recheck and reject those? I don't think other index types do that, and I'm > inclined to think the burden should be on the user not to create silly > indexes that produce an overwhelming number of false positives. Heap-access costs are added on in costsize.c, not in the index cost estimator. I don't remember at the moment whether there's any explicit accounting for lossy indexes (i.e. false positives). Up to now, there haven't been cases where we could estimate the false-positive rate with any accuracy, so we may just be ignoring the effect. But if we decide to account for it, I'd rather have costsize.c continue to add on the actual cost, perhaps based on a false-positive-rate fraction returned by the index estimator. regards, tom lane
Re: Question about pg_upgrade from 9.2 to X.X
Thank you very much Sergei, Yes, i want to get rid of old extension, Could you please share the query to find extension which is using pg_reorg. Regards, On Thu, Feb 28, 2019 at 10:27 AM Sergei Kornilov wrote: > Hello > > pgsql-hackers seems wrong list for such question. > > > could not load library "$libdir/hstore": ERROR: could not access file > "$libdir/hstore": No such file or directory > > could not load library "$libdir/adminpack": ERROR: could not access > file "$libdir/adminpack": No such file or directory > > could not load library "$libdir/uuid-ossp": ERROR: could not access > file "$libdir/uuid-ossp": No such file or directory > > > > Observation : the above Libraries are present in 9.2 whereas its mising > in 10.7. So i decided to go with lower version. > > This is contrib modules. They can be shipped in separate package, > postgresql10-contrib.x86_64 for example (in centos repo) > > > Second i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but > its failed with following error during Check Mode. > > > > could not load library "$libdir/pg_reorg": > > ERROR: could not access file "$libdir/pg_reorg": No such file or > directory > > > > Observation : In this case , pg_reorg is not present on both Source and > Target . But strange its failing. > > This is 3rd-party extension. Best way would be drop this extension on old > cluster and perform upgrade. pg_reorg is abandoned for years, pg_repack is > live fork if you need such tool. > > regards, Sergei >
Re: partitioned tables referenced by FKs
On 2019-Feb-28, Amit Langote wrote: > Hi Alvaro, > > I looked at the latest patch and most of the issues/bugs that I was going > to report based on the late January version of the patch seem to have been > taken care of; sorry that I couldn't post sooner which would've saved you > some time. The patch needs to be rebased on top of ff11e7f4b9 which > touched ri_triggers.c. Rebased to current master. I'll reply later to handle the other issues you point out. Given the comments on 0002 in this thread and elsewhere, I'm inclined to push that one today with minor tweaks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 1a0b9c9a699b2c80ebc295513e07db362b39fe0d Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 28 Nov 2018 11:52:00 -0300 Subject: [PATCH v5 1/5] Rework deleteObjectsInList to allow objtype-specific checks This doesn't change any functionality yet. --- src/backend/catalog/dependency.c | 41 +++- 1 file changed, 25 insertions(+), 16 deletions(-) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 2048d71535b..0b4c47b808c 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -230,29 +230,38 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel, int i; /* - * Keep track of objects for event triggers, if necessary. + * Invoke pre-deletion callbacks and keep track of objects for event + * triggers, if necessary. */ - if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL)) + for (i = 0; i < targetObjects->numrefs; i++) { - for (i = 0; i < targetObjects->numrefs; i++) + const ObjectAddress *thisobj = >refs[i]; + Oid objectClass = getObjectClass(thisobj); + + if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL)) { - const ObjectAddress *thisobj = >refs[i]; - const ObjectAddressExtra *extra = >extras[i]; - bool original = false; - bool normal = false; - - if (extra->flags & DEPFLAG_ORIGINAL) -original = true; - if (extra->flags & DEPFLAG_NORMAL) -normal = true; - if (extra->flags & DEPFLAG_REVERSE) -normal = true; - - if (EventTriggerSupportsObjectClass(getObjectClass(thisobj))) + if (EventTriggerSupportsObjectClass(objectClass)) { +bool original = false; +bool normal = false; +const ObjectAddressExtra *extra = >extras[i]; + +if (extra->flags & DEPFLAG_ORIGINAL) + original = true; +if (extra->flags & DEPFLAG_NORMAL || + extra->flags & DEPFLAG_REVERSE) + normal = true; + EventTriggerSQLDropAddObject(thisobj, original, normal); } } + + /* Invoke class-specific pre-deletion checks */ + switch (objectClass) + { + default: +break; + } } /* -- 2.17.1 >From f353acaffaf275df623e70c296d5833bb9f318e7 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 22 Jan 2019 18:00:31 -0300 Subject: [PATCH v5 2/5] index_get_partition --- src/backend/catalog/partition.c | 35 src/backend/commands/tablecmds.c | 40 +--- src/include/catalog/partition.h | 1 + 3 files changed, 47 insertions(+), 29 deletions(-) diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c index 3ccdaff8c45..f9282587f8b 100644 --- a/src/backend/catalog/partition.c +++ b/src/backend/catalog/partition.c @@ -145,6 +145,41 @@ get_partition_ancestors_worker(Relation inhRel, Oid relid, List **ancestors) get_partition_ancestors_worker(inhRel, parentOid, ancestors); } +/* + * Return the OID of the index, in the given partition, that is a child of the + * given index or InvalidOid if there isn't one. + */ +Oid +index_get_partition(Relation partition, Oid indexId) +{ + List *idxlist = RelationGetIndexList(partition); + ListCell *l; + + foreach(l, idxlist) + { + Oid partIdx = lfirst_oid(l); + HeapTuple tup; + Form_pg_class classForm; + bool ispartition; + + tup = SearchSysCache1(RELOID, ObjectIdGetDatum(partIdx)); + if (!tup) + elog(ERROR, "cache lookup failed for relation %u", partIdx); + classForm = (Form_pg_class) GETSTRUCT(tup); + ispartition = classForm->relispartition; + ReleaseSysCache(tup); + if (!ispartition) + continue; + if (get_partition_parent(lfirst_oid(l)) == indexId) + { + list_free(idxlist); + return partIdx; + } + } + + return InvalidOid; +} + /* * map_partition_varattnos - maps varattno of any Vars in expr from the * attno's of 'from_rel' to the attno's of 'to_rel' partition, each of which diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index a93b13c2fe4..251c5cd3fa1 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -15454,36 +15454,18 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name) static void refuseDupeIndexAttach(Relation parentIdx,
Re: Question about pg_upgrade from 9.2 to X.X
Hello pgsql-hackers seems wrong list for such question. > could not load library "$libdir/hstore": ERROR: could not access file > "$libdir/hstore": No such file or directory > could not load library "$libdir/adminpack": ERROR: could not access file > "$libdir/adminpack": No such file or directory > could not load library "$libdir/uuid-ossp": ERROR: could not access file > "$libdir/uuid-ossp": No such file or directory > > Observation : the above Libraries are present in 9.2 whereas its mising in > 10.7. So i decided to go with lower version. This is contrib modules. They can be shipped in separate package, postgresql10-contrib.x86_64 for example (in centos repo) > Second i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but its > failed with following error during Check Mode. > > could not load library "$libdir/pg_reorg": > ERROR: could not access file "$libdir/pg_reorg": No such file or directory > > Observation : In this case , pg_reorg is not present on both Source and > Target . But strange its failing. This is 3rd-party extension. Best way would be drop this extension on old cluster and perform upgrade. pg_reorg is abandoned for years, pg_repack is live fork if you need such tool. regards, Sergei
Re: Question about pg_upgrade from 9.2 to X.X
Thanks Mahendra for quick response. I have followed same way, only difference i didn't bringup Source ( 9.2), But not sure how that will resolve libraries issue. All i tried with --check mode only Thanks, On Thu, Feb 28, 2019 at 10:23 AM Mahendra Singh wrote: > Hi > Please try with below commands. > > Let we want to upgrade v6 to v11. > Note: I installed my binary inside result folder. > > export OLDCLUSTER=./6_EDBAS/EDBAS/result > export NEWCLUSTER=./11_EDBAS/EDBAS/result > ./11_EDBAS/EDBAS/result/bin/pg_upgrade --old-bindir=$OLDCLUSTER/bin > --new-bindir=$NEWCLUSTER/bin --old-datadir=$OLDCLUSTER/bin/data > --new-datadir=$NEWCLUSTER/bin/data > > Note: old server should be in running state and new server should not be > in running state. > > Thanks and Regards > Mahendra > > On Thu, 28 Feb 2019 at 23:44, Perumal Raj wrote: > >> Dear SMEs >> >> I have finally decided to move forward after great hospitality in Version >> 9.2.24 :-) >> >> First i attempted to upgrade from 9.2.24 to 10.7, but its failed with >> following error during Check Mode. >> >> could not load library "$libdir/hstore": ERROR: could not access file >> "$libdir/hstore": No such file or directory >> could not load library "$libdir/adminpack": ERROR: could not access file >> "$libdir/adminpack": No such file or directory >> could not load library "$libdir/uuid-ossp": ERROR: could not access file >> "$libdir/uuid-ossp": No such file or directory >> >> Observation : the above Libraries are present in 9.2 whereas its mising >> in 10.7. So i decided to go with lower version. >> >> Second i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but >> its failed with following error during Check Mode. >> >> could not load library "$libdir/pg_reorg": >> ERROR: could not access file "$libdir/pg_reorg": No such file or >> directory >> >> Observation : In this case , pg_reorg is not present on both Source and >> Target . But strange its failing. >> >> >> Method Used : pg_upgrade >> >> Could you please share some light here to get rid of library issue . >> >> Thanks, in advance , >> Raju >> >>
Re: plpgsql variable named as SQL keyword
čt 28. 2. 2019 v 19:20 odesílatel Tom Lane napsal: > Pavel Stehule writes: > > Maybe we should to disallow variables named as sql reserved keyword. > > That would just break existing code. There are lots of other > examples where you can get away with such things. > > We've expended quite a lot of sweat to avoid reserving more names than > we had to in plpgsql. I'm disinclined to throw that away just because > somebody found an error message confusing. It's not like reserving > "offset" would cause this case to work. > partially I solved it with new warning in plpgsql_check https://github.com/okbob/plpgsql_check/commit/5b9ef57d570c1d11fb92b9cff76655a03767f662 postgres=# select * from plpgsql_check_function('omega.foo(int, int, int)'); +---+ | plpgsql_check_function | +---+ | warning:0:3:statement block:name of variable "offset" is reserved keyword | | Detail: The reserved keyword was used as variable name. | | error:42601:4:RETURN:query "SELECT offset + 1" returned 0 columns | +---+ (3 rows) I understand so it has not simple solution (or had not solution). I reported it +/- for record. Thank you for reply Pavel > regards, tom lane >
Re: Drop type "smgr"?
On 2019-02-28 13:16:02 -0500, Tom Lane wrote: > Shawn Debnath writes: > > On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote: > >> Also, I don't see why we'd need a fake pg_database row in the first > >> place. IIUC, the OID counter wraps around to FirstNormalObjectId, so > >> nobody should have a database with an OID less than that value. > > > We have scripts under catalog directory that can check to ensure OIDs > > aren't re-used accidentally. However, we still have to define an entry > > in a catalog somewhere and I was proposing creating a new one, > > pg_storage_managers?, to track these entries. > > That would fail to capture the property that the smgr OIDs mustn't > conflict with database OIDs, so the whole thing still seems like an > ugly kluge from here. It's definitely a kludge, but it doesn't seem that bad for now. Delaying a nice answer till we have a more efficient bufmgr representation doesn't seem crazy to me. I don't think there's a real conflict risk given that we don't allow for duplicated oids across catalogs at bootstrap time, and this is definitely a bootstrap time issue. > > Another thought: my colleague Anton Shyrabokau suggested potentially > > re-using forknumber to differentiate smgrs. We are using 32 bits to > > map 5 entries today. > > Yeah, that seems like it might be a workable answer. Yea, if we just split that into two 16bit entries, there'd not be much lost. Some mild mild performance regression due to more granular memory->register reads/writes, but I can't even remotely see that matter. > Since, per this discussion, the smgr IDs need not really be OIDs at > all, we just need a few bits for them. Personally I find having them as oids more elegant than the distaste from misusing the database oid for a wihle, but I think it's fair to disagree here. Greetings, Andres Freund
Re: Question about pg_upgrade from 9.2 to X.X
Hi Please try with below commands. Let we want to upgrade v6 to v11. Note: I installed my binary inside result folder. export OLDCLUSTER=./6_EDBAS/EDBAS/result export NEWCLUSTER=./11_EDBAS/EDBAS/result ./11_EDBAS/EDBAS/result/bin/pg_upgrade --old-bindir=$OLDCLUSTER/bin --new-bindir=$NEWCLUSTER/bin --old-datadir=$OLDCLUSTER/bin/data --new-datadir=$NEWCLUSTER/bin/data Note: old server should be in running state and new server should not be in running state. Thanks and Regards Mahendra On Thu, 28 Feb 2019 at 23:44, Perumal Raj wrote: > Dear SMEs > > I have finally decided to move forward after great hospitality in Version > 9.2.24 :-) > > First i attempted to upgrade from 9.2.24 to 10.7, but its failed with > following error during Check Mode. > > could not load library "$libdir/hstore": ERROR: could not access file > "$libdir/hstore": No such file or directory > could not load library "$libdir/adminpack": ERROR: could not access file > "$libdir/adminpack": No such file or directory > could not load library "$libdir/uuid-ossp": ERROR: could not access file > "$libdir/uuid-ossp": No such file or directory > > Observation : the above Libraries are present in 9.2 whereas its mising in > 10.7. So i decided to go with lower version. > > Second i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but > its failed with following error during Check Mode. > > could not load library "$libdir/pg_reorg": > ERROR: could not access file "$libdir/pg_reorg": No such file or directory > > Observation : In this case , pg_reorg is not present on both Source and > Target . But strange its failing. > > > Method Used : pg_upgrade > > Could you please share some light here to get rid of library issue . > > Thanks, in advance , > Raju > >
Re: plpgsql variable named as SQL keyword
Pavel Stehule writes: > Maybe we should to disallow variables named as sql reserved keyword. That would just break existing code. There are lots of other examples where you can get away with such things. We've expended quite a lot of sweat to avoid reserving more names than we had to in plpgsql. I'm disinclined to throw that away just because somebody found an error message confusing. It's not like reserving "offset" would cause this case to work. regards, tom lane
Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns
Hi, Thanks for the quick response. On 2019-02-28 18:28:37 +0900, Etsuro Fujita wrote: > > I'm currently > > converting the EPQ machinery to slots, and in course of that I (with > > Horiguchi-san's help), converted RefetchForeignRow to return a slot. But > > there's currently no in-core user of this facility... I guess I can > > rebase the preliminary postgres_fdw patch here, but it bitrotted > > significantly. > > I'll rebase that patch and help the testing, if you want me to. That'd be awesome. > > I also feel like there should be some test coverage for > > an API in a nontrivial part of the code... > > Yeah, but as mentioned above, the row-locking API is provided for FDWs > operating against local storage, which we don't have in core, unfortunately. Yea. file_fdw exists, but doesn't support modifications... Greetings, Andres Freund
Re: Drop type "smgr"?
Shawn Debnath writes: > On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote: >> Also, I don't see why we'd need a fake pg_database row in the first >> place. IIUC, the OID counter wraps around to FirstNormalObjectId, so >> nobody should have a database with an OID less than that value. > We have scripts under catalog directory that can check to ensure OIDs > aren't re-used accidentally. However, we still have to define an entry > in a catalog somewhere and I was proposing creating a new one, > pg_storage_managers?, to track these entries. That would fail to capture the property that the smgr OIDs mustn't conflict with database OIDs, so the whole thing still seems like an ugly kluge from here. > Another thought: my colleague Anton Shyrabokau suggested potentially > re-using forknumber to differentiate smgrs. We are using 32 bits to > map 5 entries today. Yeah, that seems like it might be a workable answer. I really do think that relying on magic database OIDs is a bad idea; if you think you aren't going to need a real database ID in there in the future, you're being short-sighted. And I guess the same argument establishes that relying on magic tablespace OIDs would also be a bad idea, even if there weren't a near-term proposal on the table for needing real tablespace IDs in an alternate smgr. So we need to find some bits somewhere else, and the fork number field is the obvious candidate. Since, per this discussion, the smgr IDs need not really be OIDs at all, we just need a few bits for them. regards, tom lane
Re: Drop type "smgr"?
Hi, On 2019-02-28 10:02:46 -0800, Shawn Debnath wrote: > We have scripts under catalog directory that can check to ensure OIDs > aren't re-used accidentally. However, we still have to define an entry > in a catalog somewhere and I was proposing creating a new one, > pg_storage_managers?, to track these entries. See [1] for previous > discussion on this topic. We wouldn't need to do catalog lookups for > being able to use the smgrs as the OIDs will be hardcoded in C, but the > data will be available for posterity and OID reservation. I'm inclined to just put them in pg_am, with a new type 's' (we already have amtype = i for indexes, I'm planning to add 't' for tables soon). While not a perfect fit for storage managers, it seems to fit well enough. > Another thought: my colleague Anton Shyrabokau suggested potentially > re-using forknumber to differentiate smgrs. We are using 32 bits to > map 5 entries today. This approach would be similar to how we split up > the segment numbers and use the higher bits to identify forget or > unlink requests for checkpointer. That could probably be done, without incurring too much overhead here. I'm not sure that the added complexity around the tree is worth it however. I personally would just go with the DB oid for the near future, where we don't need per-database storage for those. And then plan for buffer manager improvements. Greetings, Andres Freund
Question about pg_upgrade from 9.2 to X.X
Dear SMEs I have finally decided to move forward after great hospitality in Version 9.2.24 :-) First i attempted to upgrade from 9.2.24 to 10.7, but its failed with following error during Check Mode. could not load library "$libdir/hstore": ERROR: could not access file "$libdir/hstore": No such file or directory could not load library "$libdir/adminpack": ERROR: could not access file "$libdir/adminpack": No such file or directory could not load library "$libdir/uuid-ossp": ERROR: could not access file "$libdir/uuid-ossp": No such file or directory Observation : the above Libraries are present in 9.2 whereas its mising in 10.7. So i decided to go with lower version. Second i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but its failed with following error during Check Mode. could not load library "$libdir/pg_reorg": ERROR: could not access file "$libdir/pg_reorg": No such file or directory Observation : In this case , pg_reorg is not present on both Source and Target . But strange its failing. Method Used : pg_upgrade Could you please share some light here to get rid of library issue . Thanks, in advance , Raju
Re: Remove Deprecated Exclusive Backup Mode
On 2/27/19 8:22 PM, Christophe Pettus wrote: On Feb 26, 2019, at 11:38, Magnus Hagander wrote: That should not be a wiki page, really, that should be part of the main documentation. I was just suggesting using a wiki page to draft it before we drop it into the main documentation. I'm open to other suggestions, of course! It seems to me that the best way to discuss this is via a patch to the main documentation. I've written something to get us started: https://commitfest.postgresql.org/22/2042/ -- -David da...@pgmasters.net
Re: Bloom index cost model seems to be wrong
On Sun, Feb 24, 2019 at 11:09 AM Jeff Janes wrote: > I've moved this to the hackers list, and added Teodor and Alexander of the > bloom extension, as I would like to hear their opinions on the costing. > My previous patch had accidentally included a couple lines of a different thing I was working on (memory leak, now-committed), so this patch removes that diff. I'm adding it to the commitfest targeting v13. I'm more interested in feedback on the conceptual issues rather than stylistic ones, as I would probably merge the two functions together before proposing something to actually be committed. Should we be trying to estimate the false positive rate and charging cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to recheck and reject those? I don't think other index types do that, and I'm inclined to think the burden should be on the user not to create silly indexes that produce an overwhelming number of false positives. Cheers, Jeff bloom_cost_v3.patch Description: Binary data
plpgsql variable named as SQL keyword
Hi one user of plpgsql_check reported interesting error message create or replace function omega.foo(a int) returns int as $$ declare offset integer := 0; begin return offset + 1; end; $$ language plpgsql; postgres=# select omega.foo(10); ERROR: query "SELECT offset + 1" returned 0 columns CONTEXT: PL/pgSQL function omega.foo(integer) line 4 at RETURN Maybe we should to disallow variables named as sql reserved keyword. Regards Pavel
Re: Drop type "smgr"?
Robert Haas writes: > On Thu, Feb 28, 2019 at 1:03 AM Thomas Munro wrote: >> Nothing seems to break if you remove it (except for some tests using >> it in an incidental way). See attached. > FWIW, +1 from me. To be clear, I'm not objecting to the proposed patch either. I was just wondering where we plan to go from here, given that smgr.c wasn't getting removed. BTW, there is stuff in src/backend/storage/smgr/README that is already obsoleted by this patch, and more that might be obsoleted if development proceeds as discussed here. So that needs a look. regards, tom lane
Re: Drop type "smgr"?
On Thu, Feb 28, 2019 at 11:06 AM Tom Lane wrote: > It's certainly possible/likely that we're going to end up needing to > widen buffer tags to represent the smgr explicitly, because some use > cases are going to need a real database spec, some are going to need > a real tablespace spec, and some might need both. Maybe we should > just bite that bullet. Well, Andres will probably complain about that. He thinks, IIUC, that the buffer tags are too wide already and that it's significantly hurting performance on very very common operations - like buffer lookups. I haven't verified that myself, but I tend to think he knows what he's talking about. Anyway, given that your argument started off from the premise that we had to have a pg_database row, I think we'd better look a little harder at whether that premise is correct before getting too excited here. As I said in my earlier reply, I think that we probably don't need to have a pg_database row given that we wrap around to FirstNormalObjectId; any value we hard-code would be less than that. If there are other reasons why we'd need that, it might be useful to hear about them. However, all we really need to decide on this thread is whether we need 'smgr' exposed as an SQL type. And I can't see why we need that no matter how all of the rest of this turns out. Nobody is currently proposing to give users a choice of smgrs, just to use them for internal stuff. Even if that changed later, it doesn't necessarily mean we'd add back an SQL type, or that if we did it would look like the one we have today. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Row Level Security − leakproof-ness and performance implications
On 2/28/19 12:28 PM, Robert Haas wrote: > Mmmph. If your customers always have a non-production instance where > problems from production can be easily reproduced, your customers are > not much like our customers. Well I certainly did not mean to imply that this is always the case ;-) But I think it is fair to tell customers that have these tradeoffs in front of them that it would be even more wise in the case they decided to use this capability. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Re: Row Level Security − leakproof-ness and performance implications
On Thu, Feb 28, 2019 at 12:05 PM Joe Conway wrote: > I think that would affect the server logs too, no? Worth thinking about > though... Yeah, I suppose so, although there might be a way to work around that. > Also manually marking all functions leakproof is far less convenient > than turning off the check as this patch effectively allows. You would > want to keep track of the initial condition and be able to restore it if > needed. Doable but much uglier. Perhaps we could tolerate a hook that > would allow an extension to do this though? Yeah, possibly. I guess we'd have to see how ugly that looks, but > Again, remember that the actual messages are available in the server > logs. The presumption is that the server logs are kept secure, and it is > ok to leak information into them. How the customer does or does not > decide to pass some of that information on to a support group becomes a > problem to deal with on a case by case basis. > > Also, as mentioned up-thread, in many cases there is or should be a > non-production instance available to use for reproducing problems to > debug them. Presumably the data on such a system is faked or has already > been cleaned up for a wider audience. Mmmph. If your customers always have a non-production instance where problems from production can be easily reproduced, your customers are not much like our customers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Row Level Security − leakproof-ness and performance implications
On 2/28/19 11:50 AM, Robert Haas wrote: > On Thu, Feb 28, 2019 at 11:44 AM Joe Conway wrote: >> No, and Tom stated as much too, but life is all about tradeoffs. Some >> people will find this an acceptable compromise. For those that don't >> they don't have to use it. IMHO we tend toward too much nannyism too often. > > Well, I agree with that, too. > > Hmm. I don't think there's anything preventing you from implementing > this in "userspace," is there? A logging hook could suppress all > error message text, and you could just mark all functions leakproof > after that, and you'd have this exact behavior in an existing release > with no core code changes, I think. I think that would affect the server logs too, no? Worth thinking about though... Also manually marking all functions leakproof is far less convenient than turning off the check as this patch effectively allows. You would want to keep track of the initial condition and be able to restore it if needed. Doable but much uglier. Perhaps we could tolerate a hook that would allow an extension to do this though? > If you do that, or just stick this patch into your own distro, I would > be interested to hear some experiences from customers (and those who > support them) after some time had gone by. I find it hard to imagine > delivering customer support in an environment configured this way, but > sometimes my imagination is limited. Again, remember that the actual messages are available in the server logs. The presumption is that the server logs are kept secure, and it is ok to leak information into them. How the customer does or does not decide to pass some of that information on to a support group becomes a problem to deal with on a case by case basis. Also, as mentioned up-thread, in many cases there is or should be a non-production instance available to use for reproducing problems to debug them. Presumably the data on such a system is faked or has already been cleaned up for a wider audience. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: Row Level Security − leakproof-ness and performance implications
On Thu, Feb 28, 2019 at 11:44 AM Joe Conway wrote: > No, and Tom stated as much too, but life is all about tradeoffs. Some > people will find this an acceptable compromise. For those that don't > they don't have to use it. IMHO we tend toward too much nannyism too often. Well, I agree with that, too. Hmm. I don't think there's anything preventing you from implementing this in "userspace," is there? A logging hook could suppress all error message text, and you could just mark all functions leakproof after that, and you'd have this exact behavior in an existing release with no core code changes, I think. If you do that, or just stick this patch into your own distro, I would be interested to hear some experiences from customers (and those who support them) after some time had gone by. I find it hard to imagine delivering customer support in an environment configured this way, but sometimes my imagination is limited. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Row Level Security − leakproof-ness and performance implications
On 2/28/19 11:37 AM, Robert Haas wrote: > On Thu, Feb 28, 2019 at 11:14 AM Joe Conway wrote: >> > Although, and Joe may hate me for saying this, I think only the >> > non-constants should be redacted to keep some level of usability for >> > regular SQL errors. Maybe system errors like the above should be >> > removed from client messages in general. >> >> I started down this path and it looked fragile. I guess if there is >> generally enough support to think this might be viable I could open up >> that door again, but I don't want to waste time if the approach is >> really a non-starter as stated upthread :-/. > > Hmm. It seems to me that if there's a function that sometimes throws > an error and other times does not, and if that behavior is dependent > on the input, then even redacting the error message down to 'ERROR: > error' does not remove the leak. So it seems to me that regardless of > what one thinks about the proposal from a usability perspective, it's > probably not correct from a security standpoint. Information that > couldn't be leaked until present rules would leak with this change, > when the new GUCs were turned on. > > Am I wrong? No, and Tom stated as much too, but life is all about tradeoffs. Some people will find this an acceptable compromise. For those that don't they don't have to use it. IMHO we tend toward too much nannyism too often. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: Row Level Security − leakproof-ness and performance implications
On Thu, Feb 28, 2019 at 11:14 AM Joe Conway wrote: > > Although, and Joe may hate me for saying this, I think only the > > non-constants should be redacted to keep some level of usability for > > regular SQL errors. Maybe system errors like the above should be > > removed from client messages in general. > > I started down this path and it looked fragile. I guess if there is > generally enough support to think this might be viable I could open up > that door again, but I don't want to waste time if the approach is > really a non-starter as stated upthread :-/. Hmm. It seems to me that if there's a function that sometimes throws an error and other times does not, and if that behavior is dependent on the input, then even redacting the error message down to 'ERROR: error' does not remove the leak. So it seems to me that regardless of what one thinks about the proposal from a usability perspective, it's probably not correct from a security standpoint. Information that couldn't be leaked until present rules would leak with this change, when the new GUCs were turned on. Am I wrong? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company