Re: PG 12 beta 1 segfault during analyze

2019-06-15 Thread Tom Lane
Steve Singer  writes:
> I encountered the following segfault when running against a  PG 12 beta1
> during a analyze against a table.

Nobody else has reported this, so you're going to have to work on
producing a self-contained test case, or else debugging it yourself.

regards, tom lane




PG 12 beta 1 segfault during analyze

2019-06-15 Thread Steve Singer

I encountered the following segfault when running against a  PG 12 beta1

during a analyze against a table.


#0  0x56008ad0c826 in update_attstats (vacattrstats=0x0, 
natts=2139062143, inh=false,
relid=0x40>) at analyze.c:572
#1  do_analyze_rel (onerel=onerel@entry=0x7f0bc59a7a38, 
params=params@entry=0x7ffe06aeabb0, va_cols=va_cols@entry=0x0,
acquirefunc=, relpages=8, inh=inh@entry=false, 
in_outer_xact=false, elevel=13) at analyze.c:572
#2  0x56008ad0d2e0 in analyze_rel (relid=, 
relation=,
params=params@entry=0x7ffe06aeabb0, va_cols=0x0, 
in_outer_xact=, bstrategy=)

at analyze.c:260
#3  0x56008ad81300 in vacuum (relations=0x56008c4d1110, 
params=params@entry=0x7ffe06aeabb0,
bstrategy=, bstrategy@entry=0x0, 
isTopLevel=isTopLevel@entry=true) at vacuum.c:413
#4  0x56008ad8197f in ExecVacuum 
(pstate=pstate@entry=0x56008c5c2688, vacstmt=vacstmt@entry=0x56008c3e0428,

isTopLevel=isTopLevel@entry=true) at vacuum.c:199
#5  0x56008af0133b in standard_ProcessUtility (pstmt=0x56008c982e50,
queryString=0x56008c3df368 "select 
\"_disorder_replica\".finishTableAfterCopy(3); analyze 
\"disorder\".\"do_inventory\"; ", context=, params=0x0, 
queryEnv=0x0, dest=0x56008c9831d8, completionTag=0x7ffe06aeaef0 "")

at utility.c:670
#6  0x56008aefe112 in PortalRunUtility (portal=0x56008c4515f8, 
pstmt=0x56008c982e50, isTopLevel=,
setHoldSnapshot=, dest=, 
completionTag=0x7ffe06aeaef0 "") at pquery.c:1175
#7  0x56008aefec91 in PortalRunMulti 
(portal=portal@entry=0x56008c4515f8, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, 
dest=dest@entry=0x56008c9831d8, altdest=altdest@entry=0x56008c9831d8,

completionTag=completionTag@entry=0x7ffe06aeaef0 "") at pquery.c:1328
#8  0x56008aeff9e9 in PortalRun (portal=portal@entry=0x56008c4515f8, 
count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, 
dest=dest@entry=0x56008c9831d8,
altdest=altdest@entry=0x56008c9831d8, completionTag=0x7ffe06aeaef0 
"") at pquery.c:796

#9  0x56008aefb6bb in exec_simple_query (
query_string=0x56008c3df368 "select 
\"_disorder_replica\".finishTableAfterCopy(3); analyze 
\"disorder\".\"do_inventory\"; ") at postgres.c:1215



With master from today(aa087ec64f703a52f3c48c) I still get segfaults 
under do_analyze_rel


compute_index_stats (onerel=0x7f84bf1436a8, col_context=0x55a5d3d56640, 
numrows=, rows=0x55a5d4039520,
nindexes=, indexdata=0x3ff0, 
totalrows=500) at analyze.c:711
#1  do_analyze_rel (onerel=onerel@entry=0x7f84bf1436a8, 
params=0x7ffdde2b5c40, params@entry=0x3ff0,
va_cols=va_cols@entry=0x0, acquirefunc=, 
relpages=11, inh=inh@entry=false, in_outer_xact=true,

elevel=13) at analyze.c:552






Re: The flinfo->fn_extra question, from me this time.

2019-06-15 Thread Chapman Flack
On 06/15/19 21:21, Tom Lane wrote:
> Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
> infrastructure.)

That had crossed my mind ... but it seems there's around 80 or 100
lines of good stuff there that'd be a shame to duplicate. If only
init_MultiFuncCall() took an extra void ** argument, and the stock
SRF_FIRSTCALL_INIT passed &(fcinfo->flinfo->fn_extra), seems like
most of it would be reusable. shutdown_MultiFuncCall would need to work
slightly differently, and a caller who wanted to be different would need
a customized variant of SRF_PERCALL_SETUP, but that's two lines.

Cheers,
-Chap




Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-15 Thread Bruce Momjian
On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
> On 6/13/19 11:07 AM, Bruce Momjian wrote:
> > On Thu, Jun 13, 2019 at 04:26:47PM +0900, Masahiko Sawada wrote:
> >> Yeah, in principle since data key of 2 tier key architecture should
> >> not go outside database I think we should not tell data keys to
> >> utility commands. So the rearranging WAL format seems to be a better
> >> solution but is there any reason why the main data is placed at end of
> >> WAL record? I wonder if we can assemble WAL records as following order
> >> and encrypt only 3 and 4.
> >> 
> >> 1. Header data (XLogRecord and other headers)
> >> 2. Main data (xl_heap_insert, xl_heap_update etc + related data)
> >> 3. Block data (Tuple data, FPI)
> >> 4. Sub data (e.g tuple data for logical decoding)
> > 
> > Yes, that does sound like a reasonable idea.  It is similar to us not
> > encrypting the clog --- there is little value.  However, if we only
> > encrypt the cluster, we don't need to expose the relfilenode and we can
> > just encrypt the entire WAL --- I like that simplicity.  We might find
> > that the complexity of encrypting only certain tablespaces makes the
> > system slower than just encrypting the entire cluster.
> 
> 
> There are reasons other than performance to want more granular than
> entire cluster encryption. Limiting the volume of encrypted data with
> any one key for example. And not encrypting #1 & 2 above would help
> avoid known-plaintext attacks I would think.

There are no known non-exhaustive plaintext attacks on AES:


https://crypto.stackexchange.com/questions/1512/why-is-aes-resistant-to-known-plaintext-attacks

Even if we don't encrypt the first part of the WAL record (1 & 2), the
block data (3) probably has enough format for a plaintext attack.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: The flinfo->fn_extra question, from me this time.

2019-06-15 Thread Tom Lane
Chapman Flack  writes:
> So please let me know if I seem to correctly understand the limits
> on its use.

> I gather that various extensions use it to stash various things. But
> (I assume) ... they will only touch fn_extra in FmgrInfo structs that
> pertain to *their own functions*. (Please say that's true?)

> IOW, it is assured that, if I am a language handler, when I am called
> to handle a function in my language, fn_extra is mine to use as I please ...

Yup.

> ... with the one big exception, if I am handling a function in my language
> that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave
> fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there,
> and then I can stash my stuff in gunk->user_fctx for the duration of that
> SRF call.

Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
infrastructure.)

Keep in mind that in most contexts, whatever you cache in fn_extra
will only be there for the life of the current query.

regards, tom lane




The flinfo->fn_extra question, from me this time.

2019-06-15 Thread Chapman Flack
Hi hackers,

I see evidence on this list that it's sort of a rite of passage
to ask the flinfo->fn_extra question, and my time has come.

So please let me know if I seem to correctly understand the limits
on its use.

I gather that various extensions use it to stash various things. But
(I assume) ... they will only touch fn_extra in FmgrInfo structs that
pertain to *their own functions*. (Please say that's true?)

IOW, it is assured that, if I am a language handler, when I am called
to handle a function in my language, fn_extra is mine to use as I please ...

... with the one big exception, if I am handling a function in my language
that returns a set, and I will use SFRM_ValuePerCall mode, I have to leave
fn_extra NULL before SRF_FIRSTCALL_INIT(), which plants its own gunk there,
and then I can stash my stuff in gunk->user_fctx for the duration of that
SRF call.

Does that seem to catch the essentials?

Thanks,
-Chap


p.s.: noticed in fmgr/README: "Note that simple "strict" functions can
ignore both isnull and args[i].isnull, since they won't even get called
when there are any TRUE values in args[].isnull."

I get why a strict function can ignore args[i].isnull, but is the part
about ignoring isnull a mistake? A strict function can be passed all
non-null arguments and still return null if it wants to, right?




Re: Improve handling of pg_stat_statements handling of bind "IN" variables

2019-06-15 Thread Greg Stark
On Sat., Jun. 15, 2019, 8:30 p.m. Greg Stark,  wrote:

>
>
> So what would this do for someone who explicitly writes:
>
> WHERE col = ANY ?
>
> and pass an array?
>

Actually thinking about this for two more seconds the question is what it
would do with a query like

WHERE col = ANY '1,2,3'::integer[]

Or

WHERE col = ANY ARRAY[1,2,3]

Whichever the language binding that is failing to do parameterized queries
is doing to emulate them.

>


Re: Improve handling of pg_stat_statements handling of bind "IN" variables

2019-06-15 Thread Greg Stark
On Sat., Jun. 15, 2019, 12:29 p.m. Pavel Trukhanov, <
pavel.trukha...@gmail.com> wrote:

>
> So I don't think there's actually enough benefit to split those two apart.
>
> Now I want to do this: just add a meta info (basically a bool field)
> to the ArrayExpr struct, so on later stages we could tell if that's an
> ArrayExpr of an ARRAY or of an IN list. Plus to add ignoring updating
> Jumble for expression subtree within IN-list array.
>
> If that approach doesn't seem too bad to anyone, I would like to go
> forward and submit a patch – it seems pretty straightforward to
> implement that.
>

So what would this do for someone who explicitly writes:

WHERE col = ANY ?

and pass an array?

>


Re: Multivariate MCV stats can leak data to unprivileged users

2019-06-15 Thread Tomas Vondra

On Thu, Jun 13, 2019 at 07:37:45PM +0200, Tomas Vondra wrote:

...

OK, attached are patches fixing the issues reported by you and John
Naylor, and squashing the parts into just two patches (catalog split and
pg_stats_ext). Barring objections, I'll push those tomorrow.

I've renamed columns in the _data catalog from 'stx' to 'stxd', which I
think is appropriate given the "data" in catalog name.

I'm wondering if we should change the examples in SGML docs (say, in
planstats.sgml) to use the new pg_stats_ext view, instead of querying the
catalogs directly. I've tried doing that, but I found the results less
readable than what we currently have (especially for the MCV list, where
it'd require matching elements in multiple arrays). So I've left this
unchanged for now.



I've pushed those changes, after adding docs for the pg_stats_ext view.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: CREATE STATISTICS documentation bug

2019-06-15 Thread Tomas Vondra

On Fri, Jun 14, 2019 at 03:23:29PM -0400, Robert Haas wrote:

https://www.postgresql.org/docs/12/sql-createstatistics.html contains
this example command:

CREATE STATISTICS s2 (mcv) ON (a, b) FROM t2;

But that produces:

psql: ERROR:  only simple column references are allowed in CREATE STATISTICS

I think the parentheses around (a, b) just need to be removed.

P.S. I think the fact that we print "psql: " before the ERROR here is
useless clutter.  We didn't do that in v11 and prior and I think we
should kill it with fire.



I've pushed a fix for the docs issue.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-15 Thread Alvaro Herrera
On 2019-Jun-16, Oleksii Kliukin wrote:

> Alvaro Herrera  wrote:
> 
> > On 2019-Jun-14, Alvaro Herrera wrote:
> > 
> >> I think there are worse problems here.  I tried the attached isolation
> >> spec.  Note that the only difference in the two permutations is that s0
> >> finishes earlier in one than the other; yet the first one works fine and
> >> the second one hangs until killed by the 180s timeout.  (s3 isn't
> >> released for a reason I'm not sure I understand.)
> > 
> > Actually, those behaviors both seem correct to me now that I look
> > closer.  So this was a false alarm.  In the code before de87a084c0, the
> > first permutation deadlocks, and the second permutation hangs.  The only
> > behavior change is that the first one no longer deadlocks, which is the
> > desired change.
> > 
> > I'm still trying to form a case to exercise the case of skip_tuple_lock
> > having the wrong lifetime.
> 
> Hm… I think it was an oversight from my part not to give skip_lock_tuple the
> same lifetime as have_tuple_lock or first_time (also initializing it to
> false at the same time). Even if now it might not break anything in an
> obvious way, a backward jump to l3 label will leave skip_lock_tuple
> uninitialized, making it very dangerous for any future code that will rely
> on its value.

But that's not the danger ... with the current coding, it's initialized
to false every time through that block; that means the tuple lock will
never be skipped if we jump back to l3.  So the danger is that the first
iteration sets the variable, then jumps back; second iteration
initializes the variable again, so instead of skipping the lock, it
takes it, causing a spurious deadlock.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Draft back-branch release notes are up for review

2019-06-15 Thread Noah Misch
On Sat, Jun 15, 2019 at 06:05:00PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Sat, Jun 15, 2019 at 02:11:41PM -0700, Peter Geoghegan wrote:
> >> I agree that this isn't terribly significant in general. Your proposed
> >> wording seems better than what we have now, but a reference to INCLUDE
> >> indexes also seems like a good idea. They are the only type of index
> >> that could possibly have the issue with page deletion/VACUUM becoming
> >> confused.
> 
> > If true, that's important to mention, yes.
> 
> Thanks for the input, guys.  What do you think of
> 
>  Avoid writing an invalid empty btree index page in the unlikely case
>  that a failure occurs while processing INCLUDEd columns during a page
>  split (Peter Geoghegan)
> 
>  The invalid page would not affect normal index operations, but it
>  might cause failures in subsequent VACUUMs. If that has happened to
>  one of your indexes, recover by reindexing the index.

Looks good.




Re: Draft back-branch release notes are up for review

2019-06-15 Thread Peter Geoghegan
On Sat, Jun 15, 2019 at 3:05 PM Tom Lane  wrote:
> Thanks for the input, guys.  What do you think of
>
>  Avoid writing an invalid empty btree index page in the unlikely case
>  that a failure occurs while processing INCLUDEd columns during a page
>  split (Peter Geoghegan)
>
>  The invalid page would not affect normal index operations, but it
>  might cause failures in subsequent VACUUMs. If that has happened to
>  one of your indexes, recover by reindexing the index.

That seems perfect.

Thanks
-- 
Peter Geoghegan




Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-15 Thread Oleksii Kliukin
Alvaro Herrera  wrote:

> On 2019-Jun-14, Alvaro Herrera wrote:
> 
>> I think there are worse problems here.  I tried the attached isolation
>> spec.  Note that the only difference in the two permutations is that s0
>> finishes earlier in one than the other; yet the first one works fine and
>> the second one hangs until killed by the 180s timeout.  (s3 isn't
>> released for a reason I'm not sure I understand.)
> 
> Actually, those behaviors both seem correct to me now that I look
> closer.  So this was a false alarm.  In the code before de87a084c0, the
> first permutation deadlocks, and the second permutation hangs.  The only
> behavior change is that the first one no longer deadlocks, which is the
> desired change.
> 
> I'm still trying to form a case to exercise the case of skip_tuple_lock
> having the wrong lifetime.

Hm… I think it was an oversight from my part not to give skip_lock_tuple the
same lifetime as have_tuple_lock or first_time (also initializing it to
false at the same time). Even if now it might not break anything in an
obvious way, a backward jump to l3 label will leave skip_lock_tuple
uninitialized, making it very dangerous for any future code that will rely
on its value.

> The fact that both permutations behave differently, even though the
> only difference is where s0 commits relative to the s3_share step, is an
> artifact of our unusual tuple locking implementation.

Cheers,
Oleksii



Re: Draft back-branch release notes are up for review

2019-06-15 Thread Tom Lane
Noah Misch  writes:
> On Sat, Jun 15, 2019 at 02:11:41PM -0700, Peter Geoghegan wrote:
>> I agree that this isn't terribly significant in general. Your proposed
>> wording seems better than what we have now, but a reference to INCLUDE
>> indexes also seems like a good idea. They are the only type of index
>> that could possibly have the issue with page deletion/VACUUM becoming
>> confused.

> If true, that's important to mention, yes.

Thanks for the input, guys.  What do you think of

 Avoid writing an invalid empty btree index page in the unlikely case
 that a failure occurs while processing INCLUDEd columns during a page
 split (Peter Geoghegan)

 The invalid page would not affect normal index operations, but it
 might cause failures in subsequent VACUUMs. If that has happened to
 one of your indexes, recover by reindexing the index.

regards, tom lane




Re: Draft back-branch release notes are up for review

2019-06-15 Thread Noah Misch
On Sat, Jun 15, 2019 at 02:11:41PM -0700, Peter Geoghegan wrote:
> On Sat, Jun 15, 2019 at 1:39 PM Noah Misch  wrote:
> > To me, this text implies a cautious DBA should amcheck every index.  Reading
> > the thread[1], I no longer think that.  It's enough to monitor that VACUUM
> > doesn't start failing persistently on any index.  I suggest replacing this
> > release note text with something like the following:
> >
> >   Avoid writing erroneous btree index data that does not change query 
> > results
> >   but causes VACUUM to abort with "failed to re-find parent key".  Affected
> >   indexes are rare; REINDEX fixes them.
> >
> > (I removed "key truncation during a page split" as being too technical for
> > release notes.)
> 
> I agree that this isn't terribly significant in general. Your proposed
> wording seems better than what we have now, but a reference to INCLUDE
> indexes also seems like a good idea. They are the only type of index
> that could possibly have the issue with page deletion/VACUUM becoming
> confused.

If true, that's important to mention, yes.




Re: Draft back-branch release notes are up for review

2019-06-15 Thread Peter Geoghegan
On Sat, Jun 15, 2019 at 2:11 PM Peter Geoghegan  wrote:
> On Sat, Jun 15, 2019 at 1:39 PM Noah Misch  wrote:
> > To me, this text implies a cautious DBA should amcheck every index.  Reading
> > the thread[1], I no longer think that.  It's enough to monitor that VACUUM
> > doesn't start failing persistently on any index.  I suggest replacing this
> > release note text with something like the following:

FWIW, amcheck won't help here. It can only access pages through its
breadth-first search, and so will not land on any "leaked" page (i.e.
page that has no link to the tree). Ideally, amcheck would notice that
it hasn't visited certain blocks, and then inspect the blocks/pages in
a separate pass, but that doesn't happen right now.

As you know, VACUUM can find leaked blocks/pages because nbtree VACUUM
has an optimization that allows it to access them in sequential order.

-- 
Peter Geoghegan




Re: Draft back-branch release notes are up for review

2019-06-15 Thread Peter Geoghegan
On Sat, Jun 15, 2019 at 1:39 PM Noah Misch  wrote:
> To me, this text implies a cautious DBA should amcheck every index.  Reading
> the thread[1], I no longer think that.  It's enough to monitor that VACUUM
> doesn't start failing persistently on any index.  I suggest replacing this
> release note text with something like the following:
>
>   Avoid writing erroneous btree index data that does not change query results
>   but causes VACUUM to abort with "failed to re-find parent key".  Affected
>   indexes are rare; REINDEX fixes them.
>
> (I removed "key truncation during a page split" as being too technical for
> release notes.)

I agree that this isn't terribly significant in general. Your proposed
wording seems better than what we have now, but a reference to INCLUDE
indexes also seems like a good idea. They are the only type of index
that could possibly have the issue with page deletion/VACUUM becoming
confused. Even then, the risk seems minor, because there has to be an
OOM at precisely the wrong point.

If there was any kind of _bt_split() OOM in 11.3 that involved a
non-INCLUDE B-Tree index, then the OOM could only occur when we
allocate a temp page buffer. I verified that this causes no
significant issue for VACUUM. It is best avoided, since we still
"leak" the new page/buffer, although that is almost harmless.

-- 
Peter Geoghegan




Re: Draft back-branch release notes are up for review

2019-06-15 Thread Noah Misch
On Fri, Jun 14, 2019 at 04:58:47PM -0400, Tom Lane wrote:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0995cefa74510ee0e38d1bf095b2eef2c1ea37c4

> +
> + 
> +  Avoid corruption of a btree index in the unlikely case that a failure
> +  occurs during key truncation during a page split (Peter Geoghegan)
> + 

To me, this text implies a cautious DBA should amcheck every index.  Reading
the thread[1], I no longer think that.  It's enough to monitor that VACUUM
doesn't start failing persistently on any index.  I suggest replacing this
release note text with something like the following:

  Avoid writing erroneous btree index data that does not change query results
  but causes VACUUM to abort with "failed to re-find parent key".  Affected
  indexes are rare; REINDEX fixes them.

(I removed "key truncation during a page split" as being too technical for
release notes.)

[1] 
https://postgr.es/m/flat/CAH2-WzkcWT_-NH7EeL=Az4efg0KCV+wArygW8zKB=+HoP=v...@mail.gmail.com
 




Re: Do we expect tests to work with default_transaction_isolation=serializable

2019-06-15 Thread Noah Misch
On Sun, May 19, 2019 at 03:55:06PM -0700, Andres Freund wrote:
> I seem to recall that we expect tests to either work with
> default_transaction_isolation=serializable, or to set it to a different
> level where needed.
> 
> Currently that's not the case. When running check-world with PGOPTIONS
> set to -c default_transaction_isolation=serializable I get easy to fix
> failures (isolation, plpgsql) but also some apparently hanging tests
> (003_recovery_targets.pl, 003_standby_2.pl).
> 
> Do we expect this to work? If it's desirable I'll set up an animal that
> forces it to on.

I'm +1 for making it a project expectation, with an animal to confirm.  It's
not expected to work today.




Re: pg_upgrade can result in early wraparound on databases with high transaction load

2019-06-15 Thread Noah Misch
On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:
> On Mon, May 20, 2019 at 3:10 AM Jason Harvey  wrote:
> > This week I upgraded one of my large(2.8TB), high-volume databases from 9 
> > to 11. The upgrade itself went fine. About two days later, we unexpectedly 
> > hit transaction ID wraparound. What was perplexing about this was that the 
> > age of our oldest `datfrozenxid` was only 1.2 billion - far away from where 
> > I'd expect a wraparound. Curiously, the wraparound error referred to a 
> > mysterious database of `OID 0`:
> >
> > UPDATE ERROR:  database is not accepting commands to avoid wraparound data 
> > loss in database with OID 0

That's bad.

> > We were able to recover after a few hours by greatly speeding up our vacuum 
> > on our largest table.

For what it's worth, a quicker workaround is to VACUUM FREEZE any database,
however small.  That forces a vac_truncate_clog(), which recomputes the wrap
point from pg_database.datfrozenxid values.  This demonstrates the workaround:

--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -248,7 +248,10 @@ case $testhost in
 esac
 
 pg_dumpall --no-sync -f "$temp_root"/dump2.sql || pg_dumpall2_status=$?
+pg_controldata "${PGDATA}"
+vacuumdb -F template1
 pg_ctl -m fast stop
+pg_controldata "${PGDATA}"
 
 if [ -n "$pg_dumpall2_status" ]; then
echo "pg_dumpall of post-upgrade database cluster failed"

> > In a followup investigation I uncovered the reason we hit the wraparound so 
> > early, and also the cause of the mysterious OID 0 message. When pg_upgrade 
> > executes, it calls pg_resetwal to set the next transaction ID. Within 
> > pg_resetwal is the following code: 
> > https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450

pg_upgrade should set oldestXID to the same value as the source cluster or set
it like vac_truncate_clog() would set it.  Today's scheme is usually too
pessimistic, but it can be too optimistic if the source cluster was on the
bring of wrap.  Thanks for the report.




Re: Dead encoding conversion functions

2019-06-15 Thread Noah Misch
On Wed, May 29, 2019 at 03:03:13PM -0400, Tom Lane wrote:
> Pursuant to today's discussion at PGCon about code coverage, I went
> nosing into some of the particularly under-covered subdirectories
> in our tree, and immediately tripped over an interesting factoid:
> the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are
> untested ... not because the regression tests don't try, but because
> those conversions are unreachable.  pg_do_encoding_conversion() and
> its sister functions have hard-wired fast paths for any conversion
> in which the source or target encoding is SQL_ASCII, so that an
> encoding conversion function declared for such a case will never
> be used.

> This situation seems kinda silly.  My inclination is to delete
> these functions as useless, but I suppose another approach is
> to suppress the fast paths if there's a declared conversion function.
> (Doing so would likely require added catalog lookups in places we
> might not want them...)

Removing the fast paths to make ascii_to_utf8() reachable would cause ERROR
when server_encoding=SQL_ASCII, client_encoding=UTF8, and a query would
otherwise send the client any character outside 7-bit ASCII.  That's fairly
defensible, but doing it for only UTF8 and MULE_INTERNAL is not.  So if we
like the ascii_to_utf8() behavior, I think the action would be to replace the
fast path with an encoding-independent verification that all bytes are 7-bit
ASCII.  (The check would not apply when both server_encoding and
client_encoding are SQL_ASCII, of course.)  Alternately, one might prefer to
replace the fast path with an encoding verification; in the SQL_ASCII-to-UTF8
case, we'd allow byte sequences that are valid UTF8, even though the validity
may be a coincidence and mojibake may ensue.  SQL_ASCII is for being casual
about encoding, so it's not clear to me whether or not either prospective
behavior change would be an improvement.  However, I do find it clear to
delete ascii_to_utf8() and ascii_to_mic().

> If we do delete them as useless, it might also be advisable to change
> CreateConversionCommand() to refuse creation of conversions to/from
> SQL_ASCII, to prevent future confusion.

Sounds good.




Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-15 Thread Alvaro Herrera
On 2019-Jun-14, Alvaro Herrera wrote:

> I think there are worse problems here.  I tried the attached isolation
> spec.  Note that the only difference in the two permutations is that s0
> finishes earlier in one than the other; yet the first one works fine and
> the second one hangs until killed by the 180s timeout.  (s3 isn't
> released for a reason I'm not sure I understand.)

Actually, those behaviors both seem correct to me now that I look
closer.  So this was a false alarm.  In the code before de87a084c0, the
first permutation deadlocks, and the second permutation hangs.  The only
behavior change is that the first one no longer deadlocks, which is the
desired change.

I'm still trying to form a case to exercise the case of skip_tuple_lock
having the wrong lifetime.


The fact that both permutations behave differently, even though the
only difference is where s0 commits relative to the s3_share step, is an
artifact of our unusual tuple locking implementation.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Extracting only the columns needed for a query

2019-06-15 Thread Tom Lane
Melanie Plageman  writes:
> While hacking on zedstore, we needed to get a list of the columns to
> be projected--basically all of the columns needed to satisfy the
> query. The two use cases we have for this is
> 1) to pass this column list down to the AM layer for the AM to leverage it
> 2) for use during planning to improving costing
> In other threads, such as [1], there has been discussion about the
> possible benefits for all table types of having access to this set of
> columns.

The thing that most approaches to this have fallen down on is triggers ---
that is, a trigger function might access columns mentioned nowhere in the
SQL text.  (See 8b6da83d1 for a recent example :-()  If you have a plan
for dealing with that, then ...

> Approach B: after parsing and/or after planning

If we wanted to do something about this, making the planner record
the set of used columns seems like the thing to do.  We could avoid
the expense of doing it when it's not needed by setting up an AM/FDW/
etc property or callback to request it.

Another reason for having the planner do this is that presumably, in
an AM that's excited about this, the set of fetched columns should
play into the cost estimates for the scan.  I've not been paying
enough attention to the tableam work to know if we've got hooks for
the AM to affect scan costing ... but if we don't, that seems like
a hole that needs plugged.

> I'm not sure, however, if just getting the
> PathTargets from the RelOptInfos is sufficient for obtaining the
> whole set of columns used in the query.

The PathTarget only records the columns that need to be *emitted*
by the relation scan plan node.  You'd also have to look at the
baserestrictinfo conditions to see what columns are inspected by
filter conditions.  But that seems fine to me anyway since the AM
might conceivably have different requirements for filter variables than
emitted variables.  (As a concrete example, filter conditions that are
handled by non-lossy index checks might not give rise to any requirement
for the table AM to do anything.)  So that line of thought confirms that
we want to do this at the end of planning when we know the shape of the
plan tree; the parser can't do it usefully.

> Approach B, however, does not work for utility statements which do
> not go through planning.

I'm not sure why you're excited about that case?  Utility statements
tend to be pretty much all-or-nothing as far as data access goes.

regards, tom lane




Re: Improve handling of pg_stat_statements handling of bind "IN" variables

2019-06-15 Thread Pavel Trukhanov
Thanks for the feedback.

I thought once again about separating IN from ARRAY, with refactoring
etc as Tom suggested, and now I don't think it's worth it to do so.
I've tried to implement that, and besides that it will require to
change things in every part of query processing pipeline, it seems
that most of the times I will have to repeat (copy/paste) for IN case
all the code that now works in for ARRAY. At first I though there will
be simplifications, that will justify such refactoring - i.e. I
thought I could at least drop "multidims" bool that tells ARRAY[] from
ARRAY[ARRAY[]]. But it turns out it's not the case – one can write
something like "x IN (ARRAY[1], ARRAY[1,2])" that will result in
multidim IN-list array.

So I don't think there's actually enough benefit to split those two apart.

Now I want to do this: just add a meta info (basically a bool field)
to the ArrayExpr struct, so on later stages we could tell if that's an
ArrayExpr of an ARRAY or of an IN list. Plus to add ignoring updating
Jumble for expression subtree within IN-list array.

If that approach doesn't seem too bad to anyone, I would like to go
forward and submit a patch – it seems pretty straightforward to
implement that.

Thoughts?

Thank you.
 ---
Pasha Trukhanov




Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-15 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jun-14, Tom Lane wrote:
>> BTW, after looking around a bit I wonder if this complaint isn't
>> exposing an actual logic bug.  Shouldn't skip_tuple_lock have
>> a lifetime similar to first_time?

> I think there are worse problems here.  I tried the attached isolation
> spec.  Note that the only difference in the two permutations is that s0
> finishes earlier in one than the other; yet the first one works fine and
> the second one hangs until killed by the 180s timeout.  (s3 isn't
> released for a reason I'm not sure I understand.)

Ugh.

> I don't think I'm going to have time to investigate this deeply over the
> weekend, so I think the safest course of action is to revert this for
> next week's set.

+1.  This is an old bug, we don't have to improve it for this release.

regards, tom lane




assertion at postmaster start

2019-06-15 Thread Alvaro Herrera
Once in a blue moon I get this assertion failure on server start:

2019-06-15 12:00:29.650 -04 [30080] LOG:  iniciando PostgreSQL 12beta1 on 
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04) 7.4.0, 64-bit
2019-06-15 12:00:29.650 -04 [30080] LOG:  escuchando en la dirección IPv4 
«127.0.0.1», port 55432
2019-06-15 12:00:29.650 -04 [30080] LOG:  escuchando en el socket Unix 
«/tmp/.s.PGSQL.55432»
2019-06-15 12:00:29.658 -04 [30956] LOG:  el sistema de bases de datos fue 
apagado en 2019-06-15 12:00:24 -04
2019-06-15 12:00:29.659 -04 [30080] LOG:  proceso de servidor (PID 30107) 
terminó con código de salida 15
2019-06-15 12:00:29.659 -04 [30080] LOG:  terminando todos los otros procesos 
de servidor activos
TRAP: FailedAssertion(«!(AbortStartTime == 0)», Archivo: 
«/pgsql/source/master/src/backend/postmaster/postmaster.c», Línea: 2957)
Aborted (core dumped)

Apologies for the Spanish -- I cannot readily reproduce this.  In
essence, this shows a normal startup, until suddenly process 30107
terminates with exit code 15, and then while shutting everything down,
postmaster hits the aforementioned assertion and terminates.

One problem with debugging this is that I don't know what process 30107
is, since the logs don't mention it.

No idea what is going on.  But I'm going to set my script to start the
server with log_min_messages=debug1, in case I hit it again ...

Has anybody else seen this?

-- 
Álvaro Herrera




Fix typos and inconsistencies for v11+

2019-06-15 Thread Alexander Lakhin
Hello hackers,

Please consider fixing the following typos and inconsistencies living in
the source code starting from v11:
3.1 add_placeholders_to_child_joinrel -> remove (orphaned after 7cfdc770)
3.2 AttachIndexInfo -> IndexAttachInfo (an internal inconsistency)
3.3 BlockRecordInfo -> BlockInfoRecord (an internal inconsistency)
3.4 bount -> bound (a typo)
3.5 CopyBoth -> CopyBothResponse (an internal inconsistency)
3.6 directy -> directory (a typo)
3.7 ExecCreateSlotFromOuterPlan -> ExecCreateScanSlotFromOuterPlan (an
internal inconsistency)
3.8 es_epqTuple -> es_epqTupleSlot (an internal inconsistency)
3.9 ExecHashTableParallelInsert -> ExecParallelHashTableInsert (an
internal inconsistency)
3.10 ExecMakeFunctionResult -> ExecMakeFunctionResultSet (an internal
inconsistency)
3.11 fmgr_builtins_oid_index -> fmgr_builtin_oid_index (an internal
inconsistency)
3.12 freeScanStack -> remove (irrelevant after 2a636834, v12 only)
3.13 GetTupleTrigger -> GetTupleForTrigger (an internal inconsistency)
3.14 grow_barrier -> grow_batches_barrier (an internal inconsistency)
3.15 HAVE__BUIILTIN_CLZ -> HAVE__BUILTIN_CLZ (a typo, v12 only)
3.16 ignored_killed_tuples -> ignore_killed_tuples (a typo)
3.17 intset_tests_stats -> intset_test_stats (an internal inconsistency,
v12 only)
3.18 is_aggregate -> objtype (needed to determine error handling and
required result type) (an internal inconsistency)
3.19 iterate_json_string_values -> iterate_json_values (renamed in 1c1791e0)
3.20 $log_number -> remove (not used since introduction in ed8a7c6f)
3.21 mechinism -> mechanism (a typo)
3.22 new_node, new_node_item -> child, child_key (an internal
inconsistency, v12 only)
3.23 new_part_constaints -> new_part_constraints (a typo)
3.24 parentIndexOid -> parentIndexId (for the sake of consistency, but
this argument is still unused since 8b08f7d4)
3.25 particiant -> participant (a typo)
3.26 PathNameCreateShared -> SharedFileSetCreate (an internal inconsistency)
3.27 PathnameCreateTemporaryDir -> PathNameCreateTemporaryDir (an
inconsistent case)
3.28 pg_access_server_files -> pg_read_server_files or
pg_write_server_files (non-existing role referenced)
3.29 pg_beginmessage_reuse -> pq_beginmessage_reuse (a typo)
3.30 Form_pg_fdw & pg_fdw -> Form_pg_foreign_data_wrapper &
pg_foreign_data_wrapper (an internal inconsistency)
3.31 PG_MCV_LIST -> pg_mcv_list (an internal inconsistency, v12 only)
3.32 pg_partition_table -> pg_partitioned_table (an internal inconsistency)
3.33 pg_write -> pg_pwrite (an internal inconsistency, v12 only)
3.34 PLyObject_FromJsonb -> PLyObject_FromJsonbContainer (an internal
inconsistency)
3.35 port_win32.h -> win32_port.h (an internal inconsistency)
3.36 PruneCtxStateIdx -> PruneCxtStateIdx (an internal inconsistency)
3.37 SetErrormode -> SetErrorMode (an internal inconsistency)
3.38 SharedRecordTypemodRegistry -> SharedRecordTypmodRegistry (an
internal inconsistency)
3.39 SharedTupleStore -> SharedTuplestore (an internal inconsistency)
3.40 shm_mq_get_receive_bytes -> shm_mq_receive_bytes (an internal
inconsistency)
3.41 t_natts -> number-of-attributes (questionable) (renamed in
storage.sgml with 3e23b68d, but one reference is left)
3.42 tts_buffer -> remove (orphaned after 4da597ed, v12 only)
3.43 tts_flag -> tts_flags (an internal inconsistency, v12 only)
3.44 tts_off -> remove (orphaned after 4da597ed, v12 only)
3.45 _vaues -> _values (a typo)
3.46 wait_event_class -> wait_event_type (an internal inconsistency)
3.47 WarnNoTranactionBlock -> WarnNoTransactionBlock (a typo)
3.48 with-wal-segsize -> remove (orphaned after fc49e24f)
3.49 XLOG_SEG_SIZE -> WAL segment size (orphaned after fc49e24fa)

Two summary patches for REL_11_STABLE and master are attached.

Best regards,
Alexander

diff --git a/contrib/jsonb_plpython/jsonb_plpython.c b/contrib/jsonb_plpython/jsonb_plpython.c
index 1bc984d5c4..1c93a80fbc 100644
--- a/contrib/jsonb_plpython/jsonb_plpython.c
+++ b/contrib/jsonb_plpython/jsonb_plpython.c
@@ -133,7 +133,7 @@ PLyObject_FromJsonbValue(JsonbValue *jsonbValue)
 }
 
 /*
- * PLyObject_FromJsonb
+ * PLyObject_FromJsonbContainer
  *
  * Transform JsonbContainer to PyObject.
  */
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index 3bd0010bf8..1fd65f30df 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -360,7 +360,7 @@ apw_load_buffers(void)
 		Oid			current_db = blkinfo[j].database;
 
 		/*
-		 * Advance the prewarm_stop_idx to the first BlockRecordInfo that does
+		 * Advance the prewarm_stop_idx to the first BlockInfoRecord that does
 		 * not belong to this database.
 		 */
 		j++;
@@ -369,7 +369,7 @@ apw_load_buffers(void)
 			if (current_db != blkinfo[j].database)
 			{
 /*
- * Combine BlockRecordInfos for global objects with those of
+ * Combine BlockInfoRecords for global objects with those of
  * the database.
  */
 if (current_db != InvalidOid)
@@ -382,7 +382,7 @@ apw_load_buffers(void)
 
 		/*
 		 

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-06-15 Thread Bruce Momjian
On Fri, Jun 14, 2019 at 09:37:37PM -0400, Joe Conway wrote:
> On 6/14/19 6:09 PM, Bruce Momjian wrote:
> > On Fri, Jun 14, 2019 at 02:27:17PM -0400, Joe Conway wrote:
> >> On 6/13/19 11:07 AM, Bruce Momjian wrote:
> >> > In addition, while the 8k blocks would use a block cipher, the WAL would
> >> > likely use a stream cipher, and it will be very hard to use multiple
> >> > stream ciphers in a single WAL file.
> >> 
> >> I don't understand why we would not just use AES for both.
> > 
> > Uh, AES is an encryption cipher.  You can use it with block mode, CBC,
> > or stream mode, CTR, GCM;  see:
> 
> 
> AES is a block cipher, not a stream cipher. Yes you can use it in
> different modes, including chained modes (and CBC is what I would pick),
> but I assumed you were talking about an actual stream cipher algorithm.

OK, to be specific, I was thinking of using aes128-cbc for the 8k pages
and aes128-ctr for the WAL.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +