Re: PG 12 beta 1 segfault during analyze
Steve Singer writes: > I encountered the following segfault when running against a PG 12 beta1 > during a analyze against a table. Nobody else has reported this, so you're going to have to work on producing a self-contained test case, or else debugging it yourself. regards, tom lane
PG 12 beta 1 segfault during analyze
I encountered the following segfault when running against a PG 12 beta1 during a analyze against a table. #0 0x56008ad0c826 in update_attstats (vacattrstats=0x0, natts=2139062143, inh=false, relid=0x40>) at analyze.c:572 #1 do_analyze_rel (onerel=onerel@entry=0x7f0bc59a7a38, params=params@entry=0x7ffe06aeabb0, va_cols=va_cols@entry=0x0, acquirefunc=, relpages=8, inh=inh@entry=false, in_outer_xact=false, elevel=13) at analyze.c:572 #2 0x56008ad0d2e0 in analyze_rel (relid=, relation=, params=params@entry=0x7ffe06aeabb0, va_cols=0x0, in_outer_xact=, bstrategy=) at analyze.c:260 #3 0x56008ad81300 in vacuum (relations=0x56008c4d1110, params=params@entry=0x7ffe06aeabb0, bstrategy=, bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=true) at vacuum.c:413 #4 0x56008ad8197f in ExecVacuum (pstate=pstate@entry=0x56008c5c2688, vacstmt=vacstmt@entry=0x56008c3e0428, isTopLevel=isTopLevel@entry=true) at vacuum.c:199 #5 0x56008af0133b in standard_ProcessUtility (pstmt=0x56008c982e50, queryString=0x56008c3df368 "select \"_disorder_replica\".finishTableAfterCopy(3); analyze \"disorder\".\"do_inventory\"; ", context=, params=0x0, queryEnv=0x0, dest=0x56008c9831d8, completionTag=0x7ffe06aeaef0 "") at utility.c:670 #6 0x56008aefe112 in PortalRunUtility (portal=0x56008c4515f8, pstmt=0x56008c982e50, isTopLevel=, setHoldSnapshot=, dest=, completionTag=0x7ffe06aeaef0 "") at pquery.c:1175 #7 0x56008aefec91 in PortalRunMulti (portal=portal@entry=0x56008c4515f8, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x56008c9831d8, altdest=altdest@entry=0x56008c9831d8, completionTag=completionTag@entry=0x7ffe06aeaef0 "") at pquery.c:1328 #8 0x56008aeff9e9 in PortalRun (portal=portal@entry=0x56008c4515f8, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x56008c9831d8, altdest=altdest@entry=0x56008c9831d8, completionTag=0x7ffe06aeaef0 "") at pquery.c:796 #9 0x56008aefb6bb in exec_simple_query ( query_string=0x56008c3df368 "select \"_disorder_replica\".finishTableAfterCopy(3); analyze \"disorder\".\"do_inventory\"; ") at postgres.c:1215 With master from today(aa087ec64f703a52f3c48c) I still get segfaults under do_analyze_rel compute_index_stats (onerel=0x7f84bf1436a8, col_context=0x55a5d3d56640, numrows=, rows=0x55a5d4039520, nindexes=, indexdata=0x3ff0, totalrows=500) at analyze.c:711 #1 do_analyze_rel (onerel=onerel@entry=0x7f84bf1436a8, params=0x7ffdde2b5c40, params@entry=0x3ff0, va_cols=va_cols@entry=0x0, acquirefunc=, relpages=11, inh=inh@entry=false, in_outer_xact=true, elevel=13) at analyze.c:552
Re: The flinfo->fn_extra question, from me this time.
On 06/15/19 21:21, Tom Lane wrote: > Yup. (Of course, you don't have to use the SRF_FIRSTCALL_INIT > infrastructure.) That had crossed my mind ... but it seems there's around 80 or 100 lines of good stuff there that'd be a shame to duplicate. If only init_MultiFuncCall() took an extra void ** argument, and the stock SRF_FIRSTCALL_INIT passed &(fcinfo->flinfo->fn_extra), seems like most of it would be reusable. shutdown_MultiFuncCall would need to work slightly differently, and a caller who wanted to be different would need a customized variant of SRF_PERCALL_SETUP, but that's two lines. Cheers, -Chap
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote: > On 6/13/19 11:07 AM, Bruce Momjian wrote: > > On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote: > >> Yeah, in principle since data key of 2 tier key architecture should > >> not go outside database I think we should not tell data keys to > >> utility commands. So the rearranging WAL format seems to be a better > >> solution but is there any reason why the main data is placed at end of > >> WAL record? I wonder if we can assemble WAL records as following order > >> and encrypt only 3 and 4. > >> > >> 1. Header data (XLogRecord and other headers) > >> 2. Main data (xl_heap_insert, xl_heap_update etc + related data) > >> 3. Block data (Tuple data, FPI) > >> 4. Sub data (e.g tuple data for logical decoding) > > > > Yes, that does sound like a reasonable idea. It is similar to us not > > encrypting the clog --- there is little value. However, if we only > > encrypt the cluster, we don't need to expose the relfilenode and we can > > just encrypt the entire WAL --- I like that simplicity. We might find > > that the complexity of encrypting only certain tablespaces makes the > > system slower than just encrypting the entire cluster. > > > There are reasons other than performance to want more granular than > entire cluster encryption. Limiting the volume of encrypted data with > any one key for example. And not encrypting #1 & 2 above would help > avoid known-plaintext attacks I would think. There are no known non-exhaustive plaintext attacks on AES: https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks Even if we don't encrypt the first part of the WAL record (1 & 2), the block data (3) probably has enough format for a plaintext attack. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: The flinfo->fn_extra question, from me this time.
Chapman Flack writes: > So please let me know if I seem to correctly understand the limits > on its use. > I gather that various extensions use it to stash various things. But > (I assume) ... they will only touch fn_extra in FmgrInfo structs that > pertain to *their own functions*. (Please say that's true?) > IOW, it is assured that, if I am a language handler, when I am called > to handle a function in my language, fn_extra is mine to use as I please ... Yup. > ... with the one big exception, if I am handling a function in my language > that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave > fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there, > and then I can stash my stuff in gunk->user_fctx for the duration of that > SRF call. Yup. (Of course, you don't have to use the SRF_FIRSTCALL_INIT infrastructure.) Keep in mind that in most contexts, whatever you cache in fn_extra will only be there for the life of the current query. regards, tom lane
The flinfo->fn_extra question, from me this time.
Hi hackers, I see evidence on this list that it's sort of a rite of passage to ask the flinfo->fn_extra question, and my time has come. So please let me know if I seem to correctly understand the limits on its use. I gather that various extensions use it to stash various things. But (I assume) ... they will only touch fn_extra in FmgrInfo structs that pertain to *their own functions*. (Please say that's true?) IOW, it is assured that, if I am a language handler, when I am called to handle a function in my language, fn_extra is mine to use as I please ... ... with the one big exception, if I am handling a function in my language that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there, and then I can stash my stuff in gunk->user_fctx for the duration of that SRF call. Does that seem to catch the essentials? Thanks, -Chap p.s.: noticed in fmgr/README: "Note that simple "strict" functions can ignore both isnull and args[i].isnull, since they won't even get called when there are any TRUE values in args[].isnull." I get why a strict function can ignore args[i].isnull, but is the part about ignoring isnull a mistake? A strict function can be passed all non-null arguments and still return null if it wants to, right?
Re: Improve handling of pg_stat_statements handling of bind "IN" variables
On Sat., Jun. 15, 2019, 8:30 p.m. Greg Stark, wrote: > > > So what would this do for someone who explicitly writes: > > WHERE col = ANY ? > > and pass an array? > Actually thinking about this for two more seconds the question is what it would do with a query like WHERE col = ANY '1,2,3'::integer[] Or WHERE col = ANY ARRAY[1,2,3] Whichever the language binding that is failing to do parameterized queries is doing to emulate them. >
Re: Improve handling of pg_stat_statements handling of bind "IN" variables
On Sat., Jun. 15, 2019, 12:29 p.m. Pavel Trukhanov, < pavel.trukha...@gmail.com> wrote: > > So I don't think there's actually enough benefit to split those two apart. > > Now I want to do this: just add a meta info (basically a bool field) > to the ArrayExpr struct, so on later stages we could tell if that's an > ArrayExpr of an ARRAY or of an IN list. Plus to add ignoring updating > Jumble for expression subtree within IN-list array. > > If that approach doesn't seem too bad to anyone, I would like to go > forward and submit a patch – it seems pretty straightforward to > implement that. > So what would this do for someone who explicitly writes: WHERE col = ANY ? and pass an array? >
Re: Multivariate MCV stats can leak data to unprivileged users
On Thu, Jun 13, 2019 at 07:37:45PM +0200, Tomas Vondra wrote: ... OK, attached are patches fixing the issues reported by you and John Naylor, and squashing the parts into just two patches (catalog split and pg_stats_ext). Barring objections, I'll push those tomorrow. I've renamed columns in the _data catalog from 'stx' to 'stxd', which I think is appropriate given the "data" in catalog name. I'm wondering if we should change the examples in SGML docs (say, in planstats.sgml) to use the new pg_stats_ext view, instead of querying the catalogs directly. I've tried doing that, but I found the results less readable than what we currently have (especially for the MCV list, where it'd require matching elements in multiple arrays). So I've left this unchanged for now. I've pushed those changes, after adding docs for the pg_stats_ext view. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: CREATE STATISTICS documentation bug
On Fri, Jun 14, 2019 at 03:23:29PM -0400, Robert Haas wrote: https://www.postgresql.org/docs/12/sql-createstatistics.html contains this example command: CREATE STATISTICS s2 (mcv) ON (a, b) FROM t2; But that produces: psql: ERROR: only simple column references are allowed in CREATE STATISTICS I think the parentheses around (a, b) just need to be removed. P.S. I think the fact that we print "psql: " before the ERROR here is useless clutter. We didn't do that in v11 and prior and I think we should kill it with fire. I've pushed a fix for the docs issue. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock
On 2019-Jun-16, Oleksii Kliukin wrote: > Alvaro Herrera wrote: > > > On 2019-Jun-14, Alvaro Herrera wrote: > > > >> I think there are worse problems here. I tried the attached isolation > >> spec. Note that the only difference in the two permutations is that s0 > >> finishes earlier in one than the other; yet the first one works fine and > >> the second one hangs until killed by the 180s timeout. (s3 isn't > >> released for a reason I'm not sure I understand.) > > > > Actually, those behaviors both seem correct to me now that I look > > closer. So this was a false alarm. In the code before de87a084c0, the > > first permutation deadlocks, and the second permutation hangs. The only > > behavior change is that the first one no longer deadlocks, which is the > > desired change. > > > > I'm still trying to form a case to exercise the case of skip_tuple_lock > > having the wrong lifetime. > > Hm… I think it was an oversight from my part not to give skip_lock_tuple the > same lifetime as have_tuple_lock or first_time (also initializing it to > false at the same time). Even if now it might not break anything in an > obvious way, a backward jump to l3 label will leave skip_lock_tuple > uninitialized, making it very dangerous for any future code that will rely > on its value. But that's not the danger ... with the current coding, it's initialized to false every time through that block; that means the tuple lock will never be skipped if we jump back to l3. So the danger is that the first iteration sets the variable, then jumps back; second iteration initializes the variable again, so instead of skipping the lock, it takes it, causing a spurious deadlock. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Draft back-branch release notes are up for review
On Sat, Jun 15, 2019 at 06:05:00PM -0400, Tom Lane wrote: > Noah Misch writes: > > On Sat, Jun 15, 2019 at 02:11:41PM -0700, Peter Geoghegan wrote: > >> I agree that this isn't terribly significant in general. Your proposed > >> wording seems better than what we have now, but a reference to INCLUDE > >> indexes also seems like a good idea. They are the only type of index > >> that could possibly have the issue with page deletion/VACUUM becoming > >> confused. > > > If true, that's important to mention, yes. > > Thanks for the input, guys. What do you think of > > Avoid writing an invalid empty btree index page in the unlikely case > that a failure occurs while processing INCLUDEd columns during a page > split (Peter Geoghegan) > > The invalid page would not affect normal index operations, but it > might cause failures in subsequent VACUUMs. If that has happened to > one of your indexes, recover by reindexing the index. Looks good.
Re: Draft back-branch release notes are up for review
On Sat, Jun 15, 2019 at 3:05 PM Tom Lane wrote: > Thanks for the input, guys. What do you think of > > Avoid writing an invalid empty btree index page in the unlikely case > that a failure occurs while processing INCLUDEd columns during a page > split (Peter Geoghegan) > > The invalid page would not affect normal index operations, but it > might cause failures in subsequent VACUUMs. If that has happened to > one of your indexes, recover by reindexing the index. That seems perfect. Thanks -- Peter Geoghegan
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock
Alvaro Herrera wrote: > On 2019-Jun-14, Alvaro Herrera wrote: > >> I think there are worse problems here. I tried the attached isolation >> spec. Note that the only difference in the two permutations is that s0 >> finishes earlier in one than the other; yet the first one works fine and >> the second one hangs until killed by the 180s timeout. (s3 isn't >> released for a reason I'm not sure I understand.) > > Actually, those behaviors both seem correct to me now that I look > closer. So this was a false alarm. In the code before de87a084c0, the > first permutation deadlocks, and the second permutation hangs. The only > behavior change is that the first one no longer deadlocks, which is the > desired change. > > I'm still trying to form a case to exercise the case of skip_tuple_lock > having the wrong lifetime. Hm… I think it was an oversight from my part not to give skip_lock_tuple the same lifetime as have_tuple_lock or first_time (also initializing it to false at the same time). Even if now it might not break anything in an obvious way, a backward jump to l3 label will leave skip_lock_tuple uninitialized, making it very dangerous for any future code that will rely on its value. > The fact that both permutations behave differently, even though the > only difference is where s0 commits relative to the s3_share step, is an > artifact of our unusual tuple locking implementation. Cheers, Oleksii
Re: Draft back-branch release notes are up for review
Noah Misch writes: > On Sat, Jun 15, 2019 at 02:11:41PM -0700, Peter Geoghegan wrote: >> I agree that this isn't terribly significant in general. Your proposed >> wording seems better than what we have now, but a reference to INCLUDE >> indexes also seems like a good idea. They are the only type of index >> that could possibly have the issue with page deletion/VACUUM becoming >> confused. > If true, that's important to mention, yes. Thanks for the input, guys. What do you think of Avoid writing an invalid empty btree index page in the unlikely case that a failure occurs while processing INCLUDEd columns during a page split (Peter Geoghegan) The invalid page would not affect normal index operations, but it might cause failures in subsequent VACUUMs. If that has happened to one of your indexes, recover by reindexing the index. regards, tom lane
Re: Draft back-branch release notes are up for review
On Sat, Jun 15, 2019 at 02:11:41PM -0700, Peter Geoghegan wrote: > On Sat, Jun 15, 2019 at 1:39 PM Noah Misch wrote: > > To me, this text implies a cautious DBA should amcheck every index. Reading > > the thread[1], I no longer think that. It's enough to monitor that VACUUM > > doesn't start failing persistently on any index. I suggest replacing this > > release note text with something like the following: > > > > Avoid writing erroneous btree index data that does not change query > > results > > but causes VACUUM to abort with "failed to re-find parent key". Affected > > indexes are rare; REINDEX fixes them. > > > > (I removed "key truncation during a page split" as being too technical for > > release notes.) > > I agree that this isn't terribly significant in general. Your proposed > wording seems better than what we have now, but a reference to INCLUDE > indexes also seems like a good idea. They are the only type of index > that could possibly have the issue with page deletion/VACUUM becoming > confused. If true, that's important to mention, yes.
Re: Draft back-branch release notes are up for review
On Sat, Jun 15, 2019 at 2:11 PM Peter Geoghegan wrote: > On Sat, Jun 15, 2019 at 1:39 PM Noah Misch wrote: > > To me, this text implies a cautious DBA should amcheck every index. Reading > > the thread[1], I no longer think that. It's enough to monitor that VACUUM > > doesn't start failing persistently on any index. I suggest replacing this > > release note text with something like the following: FWIW, amcheck won't help here. It can only access pages through its breadth-first search, and so will not land on any "leaked" page (i.e. page that has no link to the tree). Ideally, amcheck would notice that it hasn't visited certain blocks, and then inspect the blocks/pages in a separate pass, but that doesn't happen right now. As you know, VACUUM can find leaked blocks/pages because nbtree VACUUM has an optimization that allows it to access them in sequential order. -- Peter Geoghegan
Re: Draft back-branch release notes are up for review
On Sat, Jun 15, 2019 at 1:39 PM Noah Misch wrote: > To me, this text implies a cautious DBA should amcheck every index. Reading > the thread[1], I no longer think that. It's enough to monitor that VACUUM > doesn't start failing persistently on any index. I suggest replacing this > release note text with something like the following: > > Avoid writing erroneous btree index data that does not change query results > but causes VACUUM to abort with "failed to re-find parent key". Affected > indexes are rare; REINDEX fixes them. > > (I removed "key truncation during a page split" as being too technical for > release notes.) I agree that this isn't terribly significant in general. Your proposed wording seems better than what we have now, but a reference to INCLUDE indexes also seems like a good idea. They are the only type of index that could possibly have the issue with page deletion/VACUUM becoming confused. Even then, the risk seems minor, because there has to be an OOM at precisely the wrong point. If there was any kind of _bt_split() OOM in 11.3 that involved a non-INCLUDE B-Tree index, then the OOM could only occur when we allocate a temp page buffer. I verified that this causes no significant issue for VACUUM. It is best avoided, since we still "leak" the new page/buffer, although that is almost harmless. -- Peter Geoghegan
Re: Draft back-branch release notes are up for review
On Fri, Jun 14, 2019 at 04:58:47PM -0400, Tom Lane wrote: > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0995cefa74510ee0e38d1bf095b2eef2c1ea37c4 > + > + > + Avoid corruption of a btree index in the unlikely case that a failure > + occurs during key truncation during a page split (Peter Geoghegan) > + To me, this text implies a cautious DBA should amcheck every index. Reading the thread[1], I no longer think that. It's enough to monitor that VACUUM doesn't start failing persistently on any index. I suggest replacing this release note text with something like the following: Avoid writing erroneous btree index data that does not change query results but causes VACUUM to abort with "failed to re-find parent key". Affected indexes are rare; REINDEX fixes them. (I removed "key truncation during a page split" as being too technical for release notes.) [1] https://postgr.es/m/flat/CAH2-WzkcWT_-NH7EeL=Az4efg0KCV+wArygW8zKB=+HoP=v...@mail.gmail.com
Re: Do we expect tests to work with default_transaction_isolation=serializable
On Sun, May 19, 2019 at 03:55:06PM -0700, Andres Freund wrote: > I seem to recall that we expect tests to either work with > default_transaction_isolation=serializable, or to set it to a different > level where needed. > > Currently that's not the case. When running check-world with PGOPTIONS > set to -c default_transaction_isolation=serializable I get easy to fix > failures (isolation, plpgsql) but also some apparently hanging tests > (003_recovery_targets.pl, 003_standby_2.pl). > > Do we expect this to work? If it's desirable I'll set up an animal that > forces it to on. I'm +1 for making it a project expectation, with an animal to confirm. It's not expected to work today.
Re: pg_upgrade can result in early wraparound on databases with high transaction load
On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote: > On Mon, May 20, 2019 at 3:10 AM Jason Harvey wrote: > > This week I upgraded one of my large(2.8TB), high-volume databases from 9 > > to 11. The upgrade itself went fine. About two days later, we unexpectedly > > hit transaction ID wraparound. What was perplexing about this was that the > > age of our oldest `datfrozenxid` was only 1.2 billion - far away from where > > I'd expect a wraparound. Curiously, the wraparound error referred to a > > mysterious database of `OID 0`: > > > > UPDATE ERROR: database is not accepting commands to avoid wraparound data > > loss in database with OID 0 That's bad. > > We were able to recover after a few hours by greatly speeding up our vacuum > > on our largest table. For what it's worth, a quicker workaround is to VACUUM FREEZE any database, however small. That forces a vac_truncate_clog(), which recomputes the wrap point from pg_database.datfrozenxid values. This demonstrates the workaround: --- a/src/bin/pg_upgrade/test.sh +++ b/src/bin/pg_upgrade/test.sh @@ -248,7 +248,10 @@ case $testhost in esac pg_dumpall --no-sync -f "$temp_root"/dump2.sql || pg_dumpall2_status=$? +pg_controldata "${PGDATA}" +vacuumdb -F template1 pg_ctl -m fast stop +pg_controldata "${PGDATA}" if [ -n "$pg_dumpall2_status" ]; then echo "pg_dumpall of post-upgrade database cluster failed" > > In a followup investigation I uncovered the reason we hit the wraparound so > > early, and also the cause of the mysterious OID 0 message. When pg_upgrade > > executes, it calls pg_resetwal to set the next transaction ID. Within > > pg_resetwal is the following code: > > https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450 pg_upgrade should set oldestXID to the same value as the source cluster or set it like vac_truncate_clog() would set it. Today's scheme is usually too pessimistic, but it can be too optimistic if the source cluster was on the bring of wrap. Thanks for the report.
Re: Dead encoding conversion functions
On Wed, May 29, 2019 at 03:03:13PM -0400, Tom Lane wrote: > Pursuant to today's discussion at PGCon about code coverage, I went > nosing into some of the particularly under-covered subdirectories > in our tree, and immediately tripped over an interesting factoid: > the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are > untested ... not because the regression tests don't try, but because > those conversions are unreachable. pg_do_encoding_conversion() and > its sister functions have hard-wired fast paths for any conversion > in which the source or target encoding is SQL_ASCII, so that an > encoding conversion function declared for such a case will never > be used. > This situation seems kinda silly. My inclination is to delete > these functions as useless, but I suppose another approach is > to suppress the fast paths if there's a declared conversion function. > (Doing so would likely require added catalog lookups in places we > might not want them...) Removing the fast paths to make ascii_to_utf8() reachable would cause ERROR when server_encoding=SQL_ASCII, client_encoding=UTF8, and a query would otherwise send the client any character outside 7-bit ASCII. That's fairly defensible, but doing it for only UTF8 and MULE_INTERNAL is not. So if we like the ascii_to_utf8() behavior, I think the action would be to replace the fast path with an encoding-independent verification that all bytes are 7-bit ASCII. (The check would not apply when both server_encoding and client_encoding are SQL_ASCII, of course.) Alternately, one might prefer to replace the fast path with an encoding verification; in the SQL_ASCII-to-UTF8 case, we'd allow byte sequences that are valid UTF8, even though the validity may be a coincidence and mojibake may ensue. SQL_ASCII is for being casual about encoding, so it's not clear to me whether or not either prospective behavior change would be an improvement. However, I do find it clear to delete ascii_to_utf8() and ascii_to_mic(). > If we do delete them as useless, it might also be advisable to change > CreateConversionCommand() to refuse creation of conversions to/from > SQL_ASCII, to prevent future confusion. Sounds good.
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock
On 2019-Jun-14, Alvaro Herrera wrote: > I think there are worse problems here. I tried the attached isolation > spec. Note that the only difference in the two permutations is that s0 > finishes earlier in one than the other; yet the first one works fine and > the second one hangs until killed by the 180s timeout. (s3 isn't > released for a reason I'm not sure I understand.) Actually, those behaviors both seem correct to me now that I look closer. So this was a false alarm. In the code before de87a084c0, the first permutation deadlocks, and the second permutation hangs. The only behavior change is that the first one no longer deadlocks, which is the desired change. I'm still trying to form a case to exercise the case of skip_tuple_lock having the wrong lifetime. The fact that both permutations behave differently, even though the only difference is where s0 commits relative to the s3_share step, is an artifact of our unusual tuple locking implementation. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Extracting only the columns needed for a query
Melanie Plageman writes: > While hacking on zedstore, we needed to get a list of the columns to > be projected--basically all of the columns needed to satisfy the > query. The two use cases we have for this is > 1) to pass this column list down to the AM layer for the AM to leverage it > 2) for use during planning to improving costing > In other threads, such as [1], there has been discussion about the > possible benefits for all table types of having access to this set of > columns. The thing that most approaches to this have fallen down on is triggers --- that is, a trigger function might access columns mentioned nowhere in the SQL text. (See 8b6da83d1 for a recent example :-() If you have a plan for dealing with that, then ... > Approach B: after parsing and/or after planning If we wanted to do something about this, making the planner record the set of used columns seems like the thing to do. We could avoid the expense of doing it when it's not needed by setting up an AM/FDW/ etc property or callback to request it. Another reason for having the planner do this is that presumably, in an AM that's excited about this, the set of fetched columns should play into the cost estimates for the scan. I've not been paying enough attention to the tableam work to know if we've got hooks for the AM to affect scan costing ... but if we don't, that seems like a hole that needs plugged. > I'm not sure, however, if just getting the > PathTargets from the RelOptInfos is sufficient for obtaining the > whole set of columns used in the query. The PathTarget only records the columns that need to be *emitted* by the relation scan plan node. You'd also have to look at the baserestrictinfo conditions to see what columns are inspected by filter conditions. But that seems fine to me anyway since the AM might conceivably have different requirements for filter variables than emitted variables. (As a concrete example, filter conditions that are handled by non-lossy index checks might not give rise to any requirement for the table AM to do anything.) So that line of thought confirms that we want to do this at the end of planning when we know the shape of the plan tree; the parser can't do it usefully. > Approach B, however, does not work for utility statements which do > not go through planning. I'm not sure why you're excited about that case? Utility statements tend to be pretty much all-or-nothing as far as data access goes. regards, tom lane
Re: Improve handling of pg_stat_statements handling of bind "IN" variables
Thanks for the feedback. I thought once again about separating IN from ARRAY, with refactoring etc as Tom suggested, and now I don't think it's worth it to do so. I've tried to implement that, and besides that it will require to change things in every part of query processing pipeline, it seems that most of the times I will have to repeat (copy/paste) for IN case all the code that now works in for ARRAY. At first I though there will be simplifications, that will justify such refactoring - i.e. I thought I could at least drop "multidims" bool that tells ARRAY[] from ARRAY[ARRAY[]]. But it turns out it's not the case – one can write something like "x IN (ARRAY[1], ARRAY[1,2])" that will result in multidim IN-list array. So I don't think there's actually enough benefit to split those two apart. Now I want to do this: just add a meta info (basically a bool field) to the ArrayExpr struct, so on later stages we could tell if that's an ArrayExpr of an ARRAY or of an IN list. Plus to add ignoring updating Jumble for expression subtree within IN-list array. If that approach doesn't seem too bad to anyone, I would like to go forward and submit a patch – it seems pretty straightforward to implement that. Thoughts? Thank you. --- Pasha Trukhanov
Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock
Alvaro Herrera writes: > On 2019-Jun-14, Tom Lane wrote: >> BTW, after looking around a bit I wonder if this complaint isn't >> exposing an actual logic bug. Shouldn't skip_tuple_lock have >> a lifetime similar to first_time? > I think there are worse problems here. I tried the attached isolation > spec. Note that the only difference in the two permutations is that s0 > finishes earlier in one than the other; yet the first one works fine and > the second one hangs until killed by the 180s timeout. (s3 isn't > released for a reason I'm not sure I understand.) Ugh. > I don't think I'm going to have time to investigate this deeply over the > weekend, so I think the safest course of action is to revert this for > next week's set. +1. This is an old bug, we don't have to improve it for this release. regards, tom lane
assertion at postmaster start
Once in a blue moon I get this assertion failure on server start: 2019-06-15 12:00:29.650 -04 [30080] LOG: iniciando PostgreSQL 12beta1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0, 64-bit 2019-06-15 12:00:29.650 -04 [30080] LOG: escuchando en la dirección IPv4 «127.0.0.1», port 55432 2019-06-15 12:00:29.650 -04 [30080] LOG: escuchando en el socket Unix «/tmp/.s.PGSQL.55432» 2019-06-15 12:00:29.658 -04 [30956] LOG: el sistema de bases de datos fue apagado en 2019-06-15 12:00:24 -04 2019-06-15 12:00:29.659 -04 [30080] LOG: proceso de servidor (PID 30107) terminó con código de salida 15 2019-06-15 12:00:29.659 -04 [30080] LOG: terminando todos los otros procesos de servidor activos TRAP: FailedAssertion(«!(AbortStartTime == 0)», Archivo: «/pgsql/source/master/src/backend/postmaster/postmaster.c», Línea: 2957) Aborted (core dumped) Apologies for the Spanish -- I cannot readily reproduce this. In essence, this shows a normal startup, until suddenly process 30107 terminates with exit code 15, and then while shutting everything down, postmaster hits the aforementioned assertion and terminates. One problem with debugging this is that I don't know what process 30107 is, since the logs don't mention it. No idea what is going on. But I'm going to set my script to start the server with log_min_messages=debug1, in case I hit it again ... Has anybody else seen this? -- Álvaro Herrera
Fix typos and inconsistencies for v11+
Hello hackers, Please consider fixing the following typos and inconsistencies living in the source code starting from v11: 3.1 add_placeholders_to_child_joinrel -> remove (orphaned after 7cfdc770) 3.2 AttachIndexInfo -> IndexAttachInfo (an internal inconsistency) 3.3 BlockRecordInfo -> BlockInfoRecord (an internal inconsistency) 3.4 bount -> bound (a typo) 3.5 CopyBoth -> CopyBothResponse (an internal inconsistency) 3.6 directy -> directory (a typo) 3.7 ExecCreateSlotFromOuterPlan -> ExecCreateScanSlotFromOuterPlan (an internal inconsistency) 3.8 es_epqTuple -> es_epqTupleSlot (an internal inconsistency) 3.9 ExecHashTableParallelInsert -> ExecParallelHashTableInsert (an internal inconsistency) 3.10 ExecMakeFunctionResult -> ExecMakeFunctionResultSet (an internal inconsistency) 3.11 fmgr_builtins_oid_index -> fmgr_builtin_oid_index (an internal inconsistency) 3.12 freeScanStack -> remove (irrelevant after 2a636834, v12 only) 3.13 GetTupleTrigger -> GetTupleForTrigger (an internal inconsistency) 3.14 grow_barrier -> grow_batches_barrier (an internal inconsistency) 3.15 HAVE__BUIILTIN_CLZ -> HAVE__BUILTIN_CLZ (a typo, v12 only) 3.16 ignored_killed_tuples -> ignore_killed_tuples (a typo) 3.17 intset_tests_stats -> intset_test_stats (an internal inconsistency, v12 only) 3.18 is_aggregate -> objtype (needed to determine error handling and required result type) (an internal inconsistency) 3.19 iterate_json_string_values -> iterate_json_values (renamed in 1c1791e0) 3.20 $log_number -> remove (not used since introduction in ed8a7c6f) 3.21 mechinism -> mechanism (a typo) 3.22 new_node, new_node_item -> child, child_key (an internal inconsistency, v12 only) 3.23 new_part_constaints -> new_part_constraints (a typo) 3.24 parentIndexOid -> parentIndexId (for the sake of consistency, but this argument is still unused since 8b08f7d4) 3.25 particiant -> participant (a typo) 3.26 PathNameCreateShared -> SharedFileSetCreate (an internal inconsistency) 3.27 PathnameCreateTemporaryDir -> PathNameCreateTemporaryDir (an inconsistent case) 3.28 pg_access_server_files -> pg_read_server_files or pg_write_server_files (non-existing role referenced) 3.29 pg_beginmessage_reuse -> pq_beginmessage_reuse (a typo) 3.30 Form_pg_fdw & pg_fdw -> Form_pg_foreign_data_wrapper & pg_foreign_data_wrapper (an internal inconsistency) 3.31 PG_MCV_LIST -> pg_mcv_list (an internal inconsistency, v12 only) 3.32 pg_partition_table -> pg_partitioned_table (an internal inconsistency) 3.33 pg_write -> pg_pwrite (an internal inconsistency, v12 only) 3.34 PLyObject_FromJsonb -> PLyObject_FromJsonbContainer (an internal inconsistency) 3.35 port_win32.h -> win32_port.h (an internal inconsistency) 3.36 PruneCtxStateIdx -> PruneCxtStateIdx (an internal inconsistency) 3.37 SetErrormode -> SetErrorMode (an internal inconsistency) 3.38 SharedRecordTypemodRegistry -> SharedRecordTypmodRegistry (an internal inconsistency) 3.39 SharedTupleStore -> SharedTuplestore (an internal inconsistency) 3.40 shm_mq_get_receive_bytes -> shm_mq_receive_bytes (an internal inconsistency) 3.41 t_natts -> number-of-attributes (questionable) (renamed in storage.sgml with 3e23b68d, but one reference is left) 3.42 tts_buffer -> remove (orphaned after 4da597ed, v12 only) 3.43 tts_flag -> tts_flags (an internal inconsistency, v12 only) 3.44 tts_off -> remove (orphaned after 4da597ed, v12 only) 3.45 _vaues -> _values (a typo) 3.46 wait_event_class -> wait_event_type (an internal inconsistency) 3.47 WarnNoTranactionBlock -> WarnNoTransactionBlock (a typo) 3.48 with-wal-segsize -> remove (orphaned after fc49e24f) 3.49 XLOG_SEG_SIZE -> WAL segment size (orphaned after fc49e24fa) Two summary patches for REL_11_STABLE and master are attached. Best regards, Alexander diff --git a/contrib/jsonb_plpython/jsonb_plpython.c b/contrib/jsonb_plpython/jsonb_plpython.c index 1bc984d5c4..1c93a80fbc 100644 --- a/contrib/jsonb_plpython/jsonb_plpython.c +++ b/contrib/jsonb_plpython/jsonb_plpython.c @@ -133,7 +133,7 @@ PLyObject_FromJsonbValue(JsonbValue *jsonbValue) } /* - * PLyObject_FromJsonb + * PLyObject_FromJsonbContainer * * Transform JsonbContainer to PyObject. */ diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c index 3bd0010bf8..1fd65f30df 100644 --- a/contrib/pg_prewarm/autoprewarm.c +++ b/contrib/pg_prewarm/autoprewarm.c @@ -360,7 +360,7 @@ apw_load_buffers(void) Oid current_db = blkinfo[j].database; /* - * Advance the prewarm_stop_idx to the first BlockRecordInfo that does + * Advance the prewarm_stop_idx to the first BlockInfoRecord that does * not belong to this database. */ j++; @@ -369,7 +369,7 @@ apw_load_buffers(void) if (current_db != blkinfo[j].database) { /* - * Combine BlockRecordInfos for global objects with those of + * Combine BlockInfoRecords for global objects with those of * the database. */ if (current_db != InvalidOid) @@ -382,7 +382,7 @@ apw_load_buffers(void) /*
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
On Fri, Jun 14, 2019 at 09:37:37PM -0400, Joe Conway wrote: > On 6/14/19 6:09 PM, Bruce Momjian wrote: > > On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote: > >> On 6/13/19 11:07 AM, Bruce Momjian wrote: > >> > In addition, while the 8k blocks would use a block cipher, the WAL would > >> > likely use a stream cipher, and it will be very hard to use multiple > >> > stream ciphers in a single WAL file. > >> > >> I don't understand why we would not just use AES for both. > > > > Uh, AES is an encryption cipher. You can use it with block mode, CBC, > > or stream mode, CTR, GCM; see: > > > AES is a block cipher, not a stream cipher. Yes you can use it in > different modes, including chained modes (and CBC is what I would pick), > but I assumed you were talking about an actual stream cipher algorithm. OK, to be specific, I was thinking of using aes128-cbc for the 8k pages and aes128-ctr for the WAL. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +