Re: Do we need to handle orphaned prepared transactions in the server?

2020-01-29 Thread Craig Ringer
On Thu, 30 Jan 2020 at 02:04, Hamid Akhtar  wrote:
>
> So having seen the feedback on this thread, and I tend to agree with most of 
> what has been said here, I also agree that the server core isn't really the 
> ideal place to handle the orphan prepared transactions.
>
> Ideally, these must be handled by a transaction manager, however, I do 
> believe that we cannot let database suffer for failing of an external 
> software, and we did a similar change through introduction of idle in 
> transaction timeout behavior.

The difference, IMO, is that idle-in-transaction aborts don't affect
anything we've promised to be durable.

Once you PREPARE TRANSACTION the DB has made a promise that that txn
is durable. We don't have any consistent feedback channel to back to
applications and say "Hey, if you're not going to finish this up we
need to get rid of it soon, ok?". If a distributed transaction manager
gets consensus for commit and goes to COMMIT PREPARED a previously
prepared txn only to find that it has vanished, that's a major
problem, and one that may bring the entire DTM to a halt until the
admin can intervene.

This isn't like idle-in-transaction aborts. It's closer to something
like uncommitting a previously committed transaction.

I do think it'd make sense to ensure that the documentation clearly
highlights the impact of abandoned prepared xacts on server resource
retention and performance, preferably with pointers to appropriate
views. I haven't reviewed the docs to see how clear that is already.

I can also see an argument for a periodic log message (maybe from
vacuum?) warning when old prepared xacts hold xmin down. Including one
sent to the client application when an explicit VACUUM is executed.
(In fact, it'd make sense to generalise that for all xmin-retention).

But I'm really not a fan of aborting such txns. If you operate with
some kind of broken global transaction manager that can forget or
abandon prepared xacts, then fix it, or adopt site-local periodic
cleanup tasks that understand your site's needs.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Physical replication slot advance is not persistent

2020-01-28 Thread Craig Ringer
On Tue, 28 Jan 2020 at 16:01, Michael Paquier  wrote:
>
> On Tue, Jan 21, 2020 at 02:07:30PM +0900, Michael Paquier wrote:
> > On Mon, Jan 20, 2020 at 11:00:14AM -0800, Andres Freund wrote:
> >> Hm. I'm think testing this with real LSNs is a better idea. What if the
> >> node actually already is past FF/ at this point? Quite unlikely,
> >> I know, but still. I.e. why not get the current LSN after the INSERT,
> >> and assert that the slot, after the restart, is that?
> >
> > Sure.  If not disabling autovacuum in the test, we'd just need to make
> > sure if that advancing is at least ahead of the INSERT position.
>
> Actually, as the advancing happens only up to this position we just
> need to make sure that the LSN reported by the slot is the same as the
> position advanced to.  I have switched the test to just do that
> instead of using a fake LSN.
>
> > Anyway, I am still not sure if we should got down the road to just
> > mark the slot as dirty if any advancing is done and let the follow-up
> > checkpoint to the work, if the advancing function should do the slot
> > flush, or if we choose one and make the other an optional choice (not
> > for back-branches, obviously.  Based on my reading of the code, my
> > guess is that a flush should happen at the end of the advancing
> > function.
>
> I have been chewing on this point for a couple of days, and as we may
> actually crash between the moment the slot is marked as dirty and the
> moment the slot information is made consistent, we still have a risk
> to have the slot go backwards even if the slot information is saved.
> The window is much narrower, but well, the docs of logical decoding
> mention that this risk exists.  And the patch becomes much more
> simple without changing the actual behavior present since the feature
> has been introduced for logical slots.  There could be a point in
> having a new option to flush the slot information, or actually a
> separate function to flush the slot information, but let's keep that
> for a future possibility.
>
> So attached is an updated patch which addresses the problem just by
> marking a physical slot as dirty if any advancing is done.  Some
> documentation is added about the fact that an advance is persistent
> only at the follow-up checkpoint.  And the tests are fixed to not use
> a fake LSN but instead advance to the latest LSN position produced.
>
> Any objections?

LGTM. Thankyou.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Do we need to handle orphaned prepared transactions in the server?

2020-01-27 Thread Craig Ringer
On Thu, 23 Jan 2020 at 15:04, Michael Paquier  wrote:

> It seems to me that what you are describing here is a set of
> properties good for a monitoring tool that we don't necessarily need
> to maintain in core.  There are already tools able to do that in ways
> I think are better than what we could ever design, like
> check_pgactivity and such.

I really have to disagree here.

Relying on external tools gives users who already have to piece
together a lot of fragments even more moving parts to keep track of.
It introduces more places where new server releases may not be
supported in a timely manner by various tools users rely on. More
places where users may get wrong or incomplete information from
outdated or incorrect tools. I cite the monstrosity that
"check_postgres.pl" has become as a specific example of why pushing
our complexity onto external tools is not always the right answer.

We already have a number of views that prettify information to help
administrators operate the server. You could argue that
pg_stat_activity and pg_stat_replication are unnecessary for example;
users should use external tools to query pg_stat_get_activity(),
pg_stat_get_wal_senders(), pg_authid and pg_database directly to get
the information they need. Similarly, we could do away with
pg_stat_user_indexes and the like, as they're just convenience views
over lower level information exposed by the server.

But can you really imagine using postgres day to day without pg_stat_activity?

It is my firm opinion that visibility into locking behaviour and lock
waits is of a similar level of importance. So is giving users some way
to get insight into table and index bloat on our MVCC database. With
the enormous uptake of various forms of replication and HA it's also
important that users also be able to see what's affecting resource
retention - holding down vacuum, retaining WAL, etc.

The server knows more than any tools. Views in the server can also be
maintained along with the server to address changes in how it manages
things like resource retention, so external tools get a more
consistent insight into server behaviour.

> I'd rather just focus in the core code on the basics with views
> that map directly to what we have in memory and/or disk.

Per above, I just can't agree with this. PostgreSQL is a system with
end users who need to interact with it, most of whom will not know how
its innards work. If we're going to position it even more as a
component in some larger stack such that it's not expected to really
be used standalone, then we should make some effort to guide users
toward the other components they will need *in our own documentation*
and ensure they're tested and maintained.

Proposals to do that with HA and failover tooling, backup tooling etc
have never got off the ground. I think we do users a great disservice
there personally. I don't expect any proposal to bless specific
monitoring tools to be any more successful.

More importantly, I fail to see why every monitoring tool should
reinvent the same information collection queries and views, each with
their own unique bugs and quirks, when we can provide information
users need directly from the server.

In any case I guess it's all hot air unless I pony up a patch to show
how I think it should work.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Do we need to handle orphaned prepared transactions in the server?

2020-01-22 Thread Craig Ringer
On Thu, 23 Jan 2020 at 01:20, Tom Lane  wrote:
>
> Bruce Momjian  writes:
> > I think the big question is whether we want to make active prepared
> > transactions more visible to administrators, either during server start
> > or idle duration.
>
> There's already the pg_prepared_xacts view ...

I think Bruce has a point here. We shouldn't go around "resolving"
prepared xacts, but the visibility of them is a problem for users.
I've seen that myself quite enough times, even now that they cannot be
used by default.

Our monitoring and admin views are not keeping up with Pg's
complexity. Resource retention is one area where that's becoming a
usability and admin challenge. If a user has growing bloat (and have
managed to figure that out, since we don't make it easy to do that
either) or unexpected WAL retention they may find it hard to quickly
work out why.

We could definitely improve on that by exposing a view that integrates
everything that holds down xmin and catalog_xmin. It'd show

* the datfrozenxid and datminmxid for the oldest database
  * if that database is the current database, the relation(s) with the
oldest relfrozenxid and relminmxd
  * ... and the catalog relation(s) with the oldest relfrozenxid and
relminmxid if greater
* the absolute xid and xid-age positions of entries in pg_replication_slots
* pg_stat_replication connections (joined to pg_stat_replication if
connected) with their feedback xmin
* pg_stat_activity backend_xid and backend_xmin for the backend(s)
with oldest values; this may be different sets of backends
* pg_prepared_xacts entries by oldest xid

... probably sorted by xid age.

It'd be good to expose some internal state too, which would usually
correspond to the oldest values found in the above, but is useful for
cross-checking:

* RecentGlobalXmin and RecentGlobalDataXmin to show the xmin and
catalog_xmin actually used
* procArray->replication_slot_xmin and procArray->replication_slot_catalog_xmin

I'm not sure whether WAL retention (lsn tracking) should be in the
same view or a different one, but I lean toward different.

I already have another TODO kicking around for me to write a view that
generates a blocking locks graph, since pg_locks is really more of a
building block than a directly useful view for admins to understand
the system's state. And if that's not enough I also want to write a
decent bloat-checking view to include in the system views, since IMO
lock-blocking, bloat, and resource retention are real monitoring pain
points right now.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Do we need to handle orphaned prepared transactions in the server?

2020-01-22 Thread Craig Ringer
On Wed, 22 Jan 2020 at 23:22, Tom Lane  wrote:
>
> Thomas Kellerer  writes:
> > Tom Lane schrieb am 22.01.2020 um 16:05:
> >> Right.  It's the XA transaction manager's job not to forget uncommitted
> >> transactions.  Reasoning as though no TM exists is not only not very
> >> relevant, but it might lead you to put in features that actually
> >> make the TM's job harder.  In particular, a timeout (or any other
> >> mechanism that leads PG to abort or commit a prepared transaction
> >> of its own accord) does that.
>
> > That's a fair point, but the reality is that not all XA transaction managers
> > do a good job with that.
>
> If you've got a crappy XA manager, you should get a better one, not
> ask us to put in features that make PG unsafe to use with well-designed
> XA managers.

Agreed. Or use some bespoke script that does the cleanup that you
think is appropriate for your particular environment and set of bugs.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Do we need to handle orphaned prepared transactions in the server?

2020-01-22 Thread Craig Ringer
On Wed, 22 Jan 2020 at 16:45, Ants Aasma  wrote:

> The intended use case of two phase transactions is ensuring atomic
> durability of transactions across multiple database systems.

Exactly. I was trying to find a good way to say this.

It doesn't make much sense to embed a 2PC resolver in Pg unless it's
an XA coordinator or similar. And generally it doesn't make sense for
the distributed transaction coordinator to reside alongside one of the
datasources being managed anyway, especially where failover and HA are
in the picture.

I *can* see it being useful, albeit rather heavyweight, to implement
an XA coordinator on top of PostgreSQL. Mostly for HA and replication
reasons. But generally you'd use postgres instances for the HA
coordinator and the DB(s) in which 2PC txns are being managed. While
you could run them in the same instance it'd probably mostly be for
toy-scale PoC/demo/testing use.

So I don't really see the point of doing anything with 2PC xacts
within Pg proper. It's the job of the app that prepares the 2PC xacts,
and if that app is unable to resolve them for some reason there's no
generally-correct action to take without administrator action.




Re: Remove page-read callback from XLogReaderState.

2020-01-21 Thread Craig Ringer
On Tue, 21 Jan 2020 at 18:46, Kyotaro Horiguchi  wrote:
>
> Hello.
>
> At Mon, 20 Jan 2020 17:24:07 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> > Separating XLogBeginRead seems reasonable.  The annoying recptr trick
> > is no longer needed. In particular some logical-decoding stuff become
> > far cleaner by the patch.
> >
> > fetching_ckpt of ReadRecord is a bit annoying but that coundn't be
> > moved outside to, say, ReadCheckpointRecord in a clean way. (Anyway
> > XLogBeginRead is not the place, of course).
> >
> > As I looked through it, it looks good to me but give me some more time
> > to look closer.
>
> It seems to me that it works perfectly, and everything looks good

I seem to remember some considerable pain in this area when it came to
timeline switches. Especially with logical decoding and xlog records
that split across a segment boundary.

My first attempts at logical decoding timeline following appeared fine
and passed tests until they were put under extended real world
workloads, at which point they exploded when they tripped corner cases
like this.

I landed up writing ridiculous regression tests to trigger some of
these behaviours. I don't recall how many of them made it into the
final patch to core but it's worth a look in the TAP test suite.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: pg13 PGDLLIMPORT list

2020-01-20 Thread Craig Ringer
On Tue, 21 Jan 2020 at 12:49, Michael Paquier  wrote:
>
> On Tue, Jan 21, 2020 at 08:34:05AM +0530, Amit Kapila wrote:
> > As such no objection, but I am not sure if the other person need it on
> > back branches as well.  Are you planning to commit this, or if you
> > want I can take care of it?
>
> Thanks for the reminder.  Done now.  I have also switched the
> surrounding parameters while on it to not be inconsistent.

While speaking of PGDLLIMPORT, I wrote a quick check target in the
makefiles for some extensions I work on. It identified the following
symbols as used by the extensions but not exported:

XactLastRecEnd (xlog.h)
criticalSharedRelcachesBuilt   (relcache.h)
hot_standby_feedback   (walreceiver.h)
pgstat_track_activities  (pgstat.h)
WalRcv  (walreceiver.h)
wal_receiver_status_interval (walreceiver.h)
wal_retrieve_retry_interval (walreceiver.h)

Of those, XactLastRecEnd is by far the most important.

Failure to export pgstat_track_activities is a bug IMO, since it's
exported by inline functions pgstat_report_wait_start() and
pgstat_report_wait_end() in pgstat.h

criticalSharedRelcachesBuilt is useful in extensions that may do genam
systable_beginscan() etc in functions called both early in startup and
later on.

hot_standby_feedback can be worked around by reading the GUC via the
config options interface. But IMO all GUC symbols should be
PGDLLEXPORTed, especially since we lack an interface for extensions to
read arbitrary GUC values w/o formatting to string then parsing the
string.

wal_receiver_status_interval and wal_retrieve_retry_interval are not
that important, but again they're GUCs.

Being able to see WalRcv is very useful when running extension code on
a streaming physical replica, where you want to make decisions based
on what's actually replicated.

Anyone object to exporting these?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Physical replication slot advance is not persistent

2020-01-20 Thread Craig Ringer
() which has to do the decoding
work again. I'd like to improve that, but I didn't intend to confuse
or sidetrack this discussion in the process. Sorry.

We don't have a SQL-level interface for reading physical WAL so there
are no corresponding concerns there.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: Physical replication slot advance is not persistent

2020-01-20 Thread Craig Ringer
On Fri, 17 Jan 2020 at 01:09, Alexey Kondratov
 wrote:
>
> > I think we shouldn't touch the paths used by replication protocol. And
> > don't we focus on how we make a change of a replication slot from SQL
> > interface persistent?  It seems to me that generaly we don't need to
> > save dirty slots other than checkpoint, but the SQL function seems
> > wanting the change to be saved immediately.

PLEASE do not make the streaming replication interface force flushes!

The replication interface should not immediately flush changes to the
slot replay position on advance. It should be marked dirty and left to
be flushed by the next checkpoint. Doing otherwise potentially
introduces a lot of unnecessary fsync()s and may have an unpleasant
impact on performance.

Clients of the replication protocol interface should be doing their
own position tracking on the client side. They should not ever be
relying on the server side restart position for correctness, since it
can go backwards on crash and restart. Any that do rely on it are
incorrect. I should propose a docs change that explains how the server
and client restart position tracking interacts on both phy and logical
rep since it's not really covered right now and naïve client
implementations will be wrong.

I don't really care if the SQL interface forces an immediate flush
since it's never going to have good performance anyway.

It's already impossible to write a strictly correct and crash safe
client with the SQL interface. Adding forced flushing won't make that
any better or worse.

The SQL interface advances the slot restart position and marks the
slot dirty *before the client has confirmed receipt of the data and
flushed it to disk*. So there's a data loss window. If the client
disconnects or crashes before all the data from that function call is
safely flushed to disk it may lose the data, then be unable to fetch
it again from the server because of the restart_lsn position advance.

Really, we should add a "no_advance_position" option to the SQL
interface, then expect the client to call a second function that
explicitly advances the restart_lsn and confirmed_flush_lsn. Otherwise
no SQL interface client can be crashsafe.




Re: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-20 Thread Craig Ringer
On Fri, 10 Jan 2020 at 06:16, Andrew Dunstan 
wrote:

> On Fri, Jan 10, 2020 at 8:32 AM Tom Lane  wrote:
> >
> > Andrew Dunstan  writes:
> > > On Fri, Jan 10, 2020 at 1:21 AM Robert Haas 
> wrote:
> > >> I share the concern about the security issue here. I can't testify to
> > >> whether Christoph's whole analysis is here, but as a general point,
> > >> non-superusers can't be allowed to do things that cause the server to
> > >> access arbitrary local files.
> >
> > > It's probably fairly easy to do (c.f. 6136e94dcb). I'm not (yet)
> > > convinced that there is any significant security threat here. This
> > > doesn't give the user or indeed any postgres code any access to the
> > > contents of these files. But if there is a consensus to restrict this
> > > I'll do it.
> >
> > Well, even without access to the file contents, the mere ability to
> > probe the existence of a file is something we don't want unprivileged
> > users to have.  And (I suppose) this is enough for that, by looking
> > at what error you get back from trying it.
> >
>
>
> OK, that's convincing enough. Will do it before long.


Thanks. I'm 100% convinced the superuser restriction should be imposed. I
can imagine there being a risk of leaking file contents in error output
such as parse errors from OpenSSL that we pass on for example. Tricking Pg
into reading from a fifo could be problematic too.

I should've applied that restriction from the start, the same way as
passwordless connections are restricted.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: pgsql: Allow 'sslkey' and 'sslcert' in postgres_fdw user mappings

2020-01-19 Thread Craig Ringer
On Thu, 9 Jan 2020 at 22:38, Christoph Berg  wrote:

> Re: Robert Haas 2020-01-09  nw+...@mail.gmail.com>
> > Does this mean that a non-superuser can induce postgres_fdw to read an
> > arbitrary file from the local filesystem?
>
> Yes, see my comments in the "Allow 'sslkey' and 'sslcert' in
> postgres_fdw user mappings" thread.


Ugh, I misread your comment.

You raise a sensible concern.

These options should be treated the same as the proposed option to allow
passwordless connections: disallow creation or alteration of FDW connection
strings that use them by non-superusers. So a superuser can define a user
mapping that uses these options, but normal users may not.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Master Master replication

2020-01-19 Thread Craig Ringer
On Mon, 20 Jan 2020 at 12:37, Prabhu R  wrote:

> Hi Team,
>
> Im looking for the roadmap of the feature - master-mater replication.
>
> In Postgresql-10, Bi-Directional replication (BDR3) has been embedded in
> it.
>

It's available as an extension, it's not part of PostgreSQL 10.

PostgreSQL 10 has single-master publish/subscribe.


> Are there any plans to have built-in master-master replications in future
> versions of Postgresql 13 (or) 14  ?
>

My understanding is that in the long term that's probably the case, but I
don't speak for 2ndQ or anyone else when saying so.

It's exceedingly unlikely that nontrivial multimaster logical replication
would be possible in PostgreSQL 13 or even 14 given the logical replication
facilities that are currently in the tree. A *lot* of things would need to
be done, and in many cases the approaches used in the current external MM
replication solutions could not be directly adopted without a lot of
redesign and revision as we already saw with the addition of basic logical
replication functionality in PostgreSQL 10.

Personally I'd be delighted to see people working towards this, but right
now I think most contributors are focused elsewhere.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Read Uncommitted

2019-12-19 Thread Craig Ringer
On Fri, 20 Dec 2019 at 12:18, Tom Lane  wrote:

> Craig Ringer  writes:
> > My understanding from reading the above is that Simon didn't propose to
> > make aborted txns visible, only in-progress uncommitted txns.
>
> Yeah, but an "in-progress uncommitted txn" can become an "aborted txn"
> at any moment, and there's no interlock that would prevent its generated
> data from being removed out from under you at any moment after that.
> So there's a race condition, and as Robert observed, the window isn't
> necessarily small.
>

Absolutely. Many of the same issues arise in the work on logical decoding
of in-progress xacts for optimistic logical decoding.

Unless such an interlock is added (with all the problems that entails,
again per the in-progress logical decoding thread) that limits this to:

* running in recovery when stopped at a recovery target; or
* peeking at the contents of individual prepared xacts that we can prevent
someone else concurrently aborting/committing

That'd actually cover the only things I'd personally actually want a
feature like this for anyway.

In any case, Simon's yanked the proposal. I'd like to have some way to peek
at the contents of individual uncommited xacts, but it's clearly not going
to be anything called READ UNCOMMITTED that applies to all uncommitted
xacts at once...


> > I think the suggestions for a SRF based approach might make sense.
>
> Yeah, I'd rather do it that way than via a transaction isolation
> level.  The isolation-level approach seems like people will expect
> stronger semantics than we could actually provide.
>

Yeah. Definitely not an isolation level.

I'll be interesting to see if this sparks any more narrowly scoped and
targeted ideas, anyway. Thanks for taking the time to think about it.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Read Uncommitted

2019-12-19 Thread Craig Ringer
On Thu, 19 Dec 2019 at 23:36, Andres Freund  wrote:

> Hi,
>
> > On the patch as proposed this wouldn't be possible because a toast row
> > can't be vacuumed and then reused while holding back xmin, at least as I
> > understand it.
>
> Vacuum and pruning remove rows where xmin didn't commit, without testing
> against the horizon. Which makes sense, because normally there's so far
> no snapshot including them. Unless we were to weaken that logic -
> which'd have bloat impacts - a snapshot wouldn't guarantee anything
> about the non-removal of such tuples, unless I am missing something.
>

My understanding from reading the above is that Simon didn't propose to
make aborted txns visible, only in-progress uncommitted txns.

Vacuum only removes such rows if the xact is (a) explicitly aborted in clog
or (b) provably not still running. It checks RecentXmin and the running
xids arrays to handle xacts that went away after a crash. Per
TransactionIdIsInProgress() as used by HeapTupleSatisfiesVacuum(). I see
that it's not *quite* as simple as using the RecentXmin threhold, as xacts
newer than RecentXmin may also be seen as not in-progress if they're absent
in the shmem xact arrays and there's no overflow.

But that's OK so long as the only xacts that some sort of read-uncommitted
feature allows to become visible are ones that
satisfy TransactionIdIsInProgress(); they cannot have been vacuumed.

The bigger issue, and the one that IMO makes it impractical to spell this
as "READ UNCOMMITTED", is that an uncommitted txn might've changed the
catalogs so there is no one snapshot that is valid for interpreting all
possible tuples. It'd have to see only txns that have no catalog changes,
or be narrowed to see just *one specific txn* that had catalog changes.
That makes it iffy to spell it as "READ UNCOMMITTED" since we can't
actually make all uncommitted txns visible at once.

I think the suggestions for a SRF based approach might make sense. Perhaps
a few functions:

* a function to list all in-progress xids

* a function to list in-progress xids with/without catalog changes (if
possible, unsure if we know that until the commit record is written)

* a function (or procedure?) to execute a read-only SELECT or WITH query
within a faked-up snapshot for some in-progress xact and return a SETOF
RECORD with results. If the txn has catalog changes this would probably
have to coalesce each result field with non-builtin data types to text, or
do some fancy validation to compare the definition in the txn snapshot with
the latest normal snapshot used by the calling session. Ideally this
function could take an array of xids and would query with them all visible
unless there were catalog changes in any of them, then it'd ERROR.

* a function to generate the SQL text for an alias clause that maps the
RECORD returned by the above function, so you can semi-conveniently query
it. (I don't think we have a way for a C callable function to supply a
dynamic resultset type at plan-time to avoid the need for this, do we?
Perhaps if we use a procedure not a function?)



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Questions about PostgreSQL implementation details

2019-12-12 Thread Craig Ringer
On Mon, 9 Dec 2019 at 23:35, Julien Delplanque 
wrote:

> Hello PostgreSQL hackers,
>
> I hope I am posting on the right mailing-list.
>
> I am actually doing a PhD related to relational databases and software
> engineering.
>
> I use PostgreSQL for my research.
>
> I have a few questions about the internals of PostgreSQL and I think they
> require experts knowledge.
>
> I could not find documentation about that in the nice PostgreSQL
> documentation but maybe I missed something? Tell me if it is the case.
>

There are a bunch of README files in the source tree that concern various
innards of PostgreSQL. They're not always referred to by any comments etc,
so you have to know they exist. They're usually well worth reading, though
it can take a while before you understand enough of PostgreSQL's
architecture for them to make sense...

Try

find src/ -name README\*


> Q1. Are PostgreSQL's meta-description tables (such as pg_class) the
> "reality" concerning the state of the DB or are they just a virtual
> representation ?
>

That's been largely answered. But I want to point out an important caveat
that isn't obvious to new people: The oid of a relation (pg_class.oid) is
not the same thing as the pg_class.relfilenode, which is usually the base
of the filename of the on-disk storage for the relation. On an idle or new
database most relations  are created with an equal oid and relfilename, so
it's easy to think the oid maps to the on-disk name of a relation, but it
doesn't. The relation oid will not change so long as the relation exists,
but the relfilenode may change if the table contents are rewritten, etc.
Additionally, there are special tables that are "relmapped" such that they
don't have a normal relfilenode at all, instead access is indirected via a
separate mapping. IIRC that's mainly necessary so we can bootstrap access
to the catalog tables that tell us how to read the catalogs.

What I would like to know with this question is: would it be possible to
> implement DDL queries (e.g. CREATE TABLE, DROP TABLE, CREATE VIEW, ALTER
> TABLE, etc.) as DML queries that modify the meta-data stored in
> meta-description tables?
>

Not really.

PostgreSQL has a caching layer - sycache, relcache, catcache - and
invalidation scheme that it relies on. It doesn't execute regular queries
on the system catalogs. It also has simplifying rules around how they are
updated and accessed. See the logic in genam.c etc. Catalogs may also
represent things that aren't just other DB rows - for example, pg_class
entries are associated with files on disk for individual database tables.

You can't just insert into pg_class, pg_attribute, etc and expect that to
safely create a table. Though it's surprising how much you can get away
with by hacking the catalogs if you're very careful and you trick
PostgreSQL into firing appropriate invalidations. I'd quite like to have a
SQL-exposed way to do a forced global cache flush and invalidation for use
in emergency scary catalog hacking situations.

So you can do quite a bit with direct catalog surgery, but it's dangerous
and if you break the database, you get to keep the pieces.

Q1.1 If it is possible, is what is done in reality? I have the feeling that
> it is not the case and that DDL queries are implemented in C directly.
>

Right. See standard_ProcessUtility() and friends.

Q1.2 If it is possible and not done, what is the reason?
>

Speed - no need to run the full executor. Simplification of catalog access.
Caching and invalidations. Chicken/egg problems: how do you "CREATE TABLE
pg_class"? . Lots more.


> Q2. Are PostgreSQL's "meta-constraints" (i.e. constraints related to
> database structure such as "a table can only have a single primary key")
> implemented in C code or via data constraints on PostgreSQL's
> meta-description tables?
>

System catalogs are not permitted to have CONSTRAINTs (CHECK constraints,
UNIQUE constraints, PRIMARY KEY constraints, FOREIGN KEY constraints, etc).

All such management is done in C level logic with the assistance of the
pg_depend catalog and the relationships it tracks.


> Q2.1 If they are not implemented via data constraints on meta-description
> tables, why ?
>

Same as above.


> Q2.2 Is there somewhere in the documentation a list of such
> "meta-constraints" implemented by PostgreSQL?
>

Not AFAIK.

Why?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Questions about PostgreSQL implementation details

2019-12-12 Thread Craig Ringer
On Tue, 10 Dec 2019 at 01:21, Tom Lane  wrote:

> Mark Dilger  writes:
> > [ useful tips about finding the code that implements a SQL command ]
>
> BTW, if it wasn't obvious already, you *really* want to have some kind
> of tool that easily finds the definition of a particular C symbol.
> You can fall back on "grep -r" or "git grep", but lots of people use
> ctags or etags or some other C-aware indexing tool.
>
>
I strongly recommend cscope with editor integration for your preferred
editor btw.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: A varint implementation for PG?

2019-12-12 Thread Craig Ringer
On Tue, 10 Dec 2019 at 09:51, Andres Freund  wrote:

> Hi,
>
> I several times, most recently for the record format in the undo
> patchset, wished for a fast variable width integer implementation for
> postgres. Using very narrow integers, for space efficiency, solves the
> space usage problem, but leads to extensibility / generality problems.
>

Yes. I've wanted flexible but efficiently packed integers quite a bit too,
especially when working with wire protocols.

Am I stabbing completely in the dark when wondering if this might be a step
towards a way to lift the size limit on VARLENA Datums like bytea ?

There are obvious practical concerns with doing so, given that our protocol
offers no handle based lazy fetching for big VARLENA values, but that too
needs a way to represent sizes sensibly and flexibly.


> Even with those caveats, I think that's a pretty good result. Other
> encodings were more expensive. And I think there's definitely some room
> for optimization left.


I don't feel at all qualified to question your analysis of the appropriate
representation. But your explanation certainly makes a lot of sense as
someone approaching the topic mostly fresh - I've done a bit with BCD but
not much else.

I assume we'd be paying a price in padding and alignment in most cases, and
probably more memory copying, but these representations would likely be
appearing mostly in places where other costs are overwhelmingly greater
like network or disk I/O.

If data lengths longer than that are required for a use case


If baking a new variant integer format now, I think limiting it to 64 bits
is probably a mistake given how long-lived PostgreSQL is, and how hard it
can be to change things in the protocol, on disk, etc.


> it
> probably is better to either a) use the max-representable 8 byte integer
> as an indicator that the length is stored or b) sacrifice another bit to
> represent whether the integer is the data itself or the length.
>

I'd be inclined to suspect that (b) is likely worth doing. If nothing else
because not being able to represent the full range of a 64-bit integer in
the variant type is potentially going to be a seriously annoying hassle at
points where we're interacting with places that could use the full width.
We'd then have the potential for variant integers of > 2^64 but at least
that's wholly under our control.

I also routinely underestimate how truly huge a 64-bit integer really is.
But even now 8 petabytes isn't as inconceivable as it used to be

It mostly depends on how often you expect you'd be coming up on the
boundaries where the extra bit would push you up a variant size.

Do others see use in this?


Yes. Very, very much yes.

I'd be quick to want to expose it to SQL too.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: xact_start for walsender & logical decoding not updated

2019-12-12 Thread Craig Ringer
On Wed, 11 Dec 2019 at 02:08, Alvaro Herrera 
wrote:

> On 2019-Dec-10, Tomas Vondra wrote:
>
> > On Tue, Dec 10, 2019 at 09:42:17AM +0900, Kyotaro Horiguchi wrote:
> > > At Tue, 10 Dec 2019 00:44:09 +0100, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote in
>
> > > I'm not sure how much xact_start for walsender is useful and we really
> > > is not running a statement there.  Also autovac launcher starts
> > > transaction without a valid statement timestamp perhaps for the same
> > > reason.
> >
> > Maybe, but then maybe we should change it so that we don't report any
> > timestamps for such processes.
>
> Yeah, I think we should to that.


Agreed. Don't report a transaction start timestamp at all if we're not in a
read/write txn in the walsender, which we should never be when using a
historic snapshot.

It's not interesting or relevant.

Reporting the commit timestamp of the current or last-processed xact would
likely just be fonfusing. I'd rather see that in pg_stat_replication if
we're going to show it, that way we can label it usefully.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: get_database_name() from background worker

2019-12-12 Thread Craig Ringer
On Thu, 12 Dec 2019 at 16:21, ROS Didier  wrote:

> Hi
>With pg_background extension ,it is possible to make  "autonomous
> transaction" which means possibility to commit in a transaction.
>  It is like a client which connects to a postgresql instance. So you can
> execute any sql orders .
>
>
Yes, that's possible. It's not easy though and I strongly suggest you look
into existing approaches like using dblink instead.

Please start a new thread rather than following an unrelated existing one.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Session WAL activity

2019-12-11 Thread Craig Ringer
On Fri, 6 Dec 2019 at 09:57, Michael Paquier  wrote:

> On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
> > Concerning keeping PGPROC size as small as possible, I agree that it is
> > reasonable argument.
> > But even now it is very large (816 bytes) and adding extra 8 bytes will
> > increase it on less than 1%.
>
> It does not mean that we should add all kind of things to PGPROC as
> that's a structure sensitive enough already.


Right. It's not as critical as PGXACT, but PGPROC is still significant for
scalability and connection count limits.

It's a shame we can't really keep most of it in backend-private memory and
copy it to requestors when they want it, say into a temporary DSA or over a
shm_mq. But our single threaded model means we just cannot rely on backends
being responsive in a timely enough manner to supply data on-demand. That
doesn't mean we have to push it to PGPROC though: we could be sending the
parts that don't have to be super-fresh to the stats collector or a new
related process for active session stats and letting it aggregate them.

That's way beyond the scope of this patch though. So personally I'm OK with
the new PGPROC field. Visibility into Pg's activity is woefully limited and
something we need to prioritize more.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: get_database_name() from background worker

2019-12-11 Thread Craig Ringer
On Wed, 11 Dec 2019 at 14:38, Koichi Suzuki  wrote:

> Hello PG hackers;
>
> I'm writing an extension running on background workers and found
> get_database_name() causes SEGV and found internally resource owner was wet
> to NULL.   Could anybody let me know how it happens and how I can use this
> function.   Argument to get_database_name() looks correct.
>
>
I think the main question is answered; if the advice given does not help
please supply your code and a backtrace from the crash obtained from a core
file.

However, this reminds me of something. I'd like to make our
syscache/relcache/catcache and all snapshot access functions
Assert(IsTransactionState()); directly or at key locations. That'd make
these mistakes much more obvious - and as bgworkers become a more popular
way to write code for PostgreSQL that's going to be important.

Similarly, it might make sense to assert that we have a valid snapshot in
the SPI, which we don't presently do for read-only SPI calls. I recall that
one biting me repeatedly when I was learning this stuff.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: ssl passphrase callback

2019-12-08 Thread Craig Ringer
On Sat, 7 Dec 2019 at 07:21, Tom Lane  wrote:

> Andrew Dunstan  writes:
> > I've just been looking at that. load_external_function() doesn't
> > actually do anything V1-ish with the value, it just looks up the symbol
> > using dlsym and returns it cast to a PGFunction. Is there any reason I
> > can't just use that and cast it again to the callback function type?
>
> TBH, I think this entire discussion has gone seriously off into the
> weeds.  The original design where we just let a shared_preload_library
> function get into a hook is far superior to any of the overcomplicated
> kluges that are being discussed now.  Something like this, for instance:
>
> >>> ssl_passphrase_command='#superlib.so,my_rot13_passphrase'
>
> makes me positively ill.  It introduces problems that we don't need,
> like how to parse out the sub-parts of the string, and the
> quoting/escaping issues that will come along with that; while from
> the user's perspective it replaces a simple and intellectually-coherent
> variable definition with an unintelligible mess.
>

+1000 from me on that.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


[PATCH] print help from psql when user tries to run pg_restore, pg_dump etc

2019-12-05 Thread Craig Ringer
New users frequently attempt to run PostgreSQL's command line utilities
from the psql prompt.

They tend to be confused when this appears to do absolutely nothing:

psql=> pg_restore
psql->

since they're generally not going to semicolon-terminate the command either.

The attached patch detects common command names when they appear first on a
new input line prints a help message. If the buffer is empty a more
detailed message is printed and the input is swallowed. Otherwise, much
like how we handle "help" etc,
a short message is printed and the input is still added to the buffer.

psql=> pg_restore
"pg_restore" is a command line utility program.
Use it from the system terminal or command prompt not from psql.
psql=>
psql=> select 1
psql-> pg_restore
"pg_restore" is a command-line utility program not a psql command. See
"help".
psql->

Wording advice would be welcome.

I'd be tempted to backpatch this, since it's one of the things I see users
confused by most often now - right up there with pg_hba.conf issues,
forgetting a semicolon in psql, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
From 451b564fa8714ff4ba21725b183913d651b1c925 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Fri, 6 Dec 2019 12:50:58 +0800
Subject: [PATCH] print help in psql when users try to run pg_dump etc

New users frequently attempt to run PostgreSQL's command line utilities
from the psql prompt. Detect this and emit a useful help message:

psql=> pg_restore
"pg_restore" is a command line utility program.
Use it from the system terminal or command prompt not from psql.
psql=>

Previously we'd just add the command to the buffer - and since the user
wouldn't generally follow it with a semicolon they'd see no response except a
prompt change:

psql=> pg_restore
psql->
---
 src/bin/psql/mainloop.c | 48 +
 1 file changed, 48 insertions(+)

diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index f7b1b94599..2c65b7a862 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -353,6 +353,54 @@ MainLoop(FILE *source)
 #else
 puts(_("Use control-C to quit."));
 #endif
+
+			/*
+			 * New users tend to be confused about the command line utilities
+			 * pg_dump, pg_restore, createdb, etc, and try to run them
+			 * interatively in psql. Detect this and emit a suitable hint.
+			 *
+			 * We check for them even when the buffer is non-empty because
+			 * users frequently try multiple commands without semicolons when
+			 * trying to misuse psql as a command-line shell. Like with "help"
+			 * etc, the text is still added to the buffer unless it's the
+			 * first word on an empty buffer.
+			 */
+			if (line[0] == 'p' || line[0] == 'c' || line[0] == 'd')
+			{
+if (strncmp(line, "pg_restore", sizeof("pg_restore")) == 0
+	|| strncmp(line, "pg_dump", sizeof("pg_dump")) == 0
+	|| strncmp(line, "createdb", sizeof("createdb")) == 0
+	|| strncmp(line, "dropdb", sizeof("dropdb")) == 0
+	|| strncmp(line, "createuser", sizeof("createuser")) == 0
+	|| strncmp(line, "dropuser", sizeof("dropuser")) == 0)
+{
+	char		cmdname[12];
+
+	strlcpy([0], line,
+			Min(strcspn(line, " \r\n") + 1, sizeof(cmdname)));
+	if (query_buf->len != 0)
+	{
+		/*
+		 * Print a short hint but don't swallow the input if
+		 * there's already something in the buffer.
+		 */
+		printf(_("\"%s\" is a command-line utility program "
+ "not a psql command.\n"),
+			   cmdname);
+	}
+	else
+	{
+		/* Swallow the input and emit more help */
+		printf(_("\"%s\" is a command line utility program.\n"
+ "Use it from the system terminal or command "
+ "prompt not from psql.\n"),
+			   cmdname);
+		free(line);
+		fflush(stdout);
+		continue;
+	}
+}
+			}
 		}
 
 		/* echo back if flag is set, unless interactive */
-- 
2.23.0



Re: Add a GUC variable that control logical replication

2019-12-01 Thread Craig Ringer
On Thu, 28 Nov 2019 at 11:53, Michael Paquier  wrote:

> On Wed, Nov 06, 2019 at 10:01:43PM +0800, Quan Zongliang wrote:
> > What the user needs is the same replication link that selectively skips
> some
> > transactions. And this choice only affects transactions that are doing
> bulk
> > delete sessions. The operations of other sessions are not affected and
> can
> > continue to output replication messages.
> > For example, session 1 wants to bulk delete 1 million old data from the
> T1
> > table, which can be done without replication. At the same time, session 2
> > deletes 10 records from T1, which is expected to be passed on through
> > replication.
> > Therefore, the two decoders can not meet this requirement. It is also
> > inappropriate to temporarily disable subscriptions because it skips all
> > transactions for a certain period of time.
>
> Hmm.  The patch discussed on this thread does not have much support
> from Peter and Craig, so I am marking it as RwF.
>
>
Yeah. I'm not against it as such. But I'd like to either see it work by
exposing the ability to use DoNotReplicateId to SQL or if that's not
satisfactory, potentially replace that mechanism with the newly added one
and emulate DoNotReplicateId for BC.

I don't want two orthogonal ways to say "don't consider this for logical
replication".
-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Missing data_sync_elevel() for some calls of pg_fsync()?

2019-12-01 Thread Craig Ringer
On Mon, 2 Dec 2019 at 13:43, Thomas Munro  wrote:

>
> 1.  Some code opens a file, writes some stuff to it, closes, and then
> fsyncs it, and if that fails and and it ever retries it'll redo all of
> those steps again.  We know that some kernels might have thrown away
> the data, but we don't care about the copy in the kernel's cache
> because we'll write it out again next time around.
>

Can we trust the kernel to be reporting the EIO or ENOSPC only from
writeback buffers for the actual file we're fsync()ing though? Not from
buffers it flushed while performing our fsync() request, failed to flush,
and complained about?

I'm not confident I want to assume that.
-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: ssl passphrase callback

2019-11-21 Thread Craig Ringer
On Fri, 15 Nov 2019 at 00:34, Andrew Dunstan 
wrote:

>
> On 11/14/19 11:07 AM, Bruce Momjian wrote:
> > On Thu, Nov 14, 2019 at 11:42:05AM +0100, Magnus Hagander wrote:
> >> On Wed, Nov 13, 2019 at 9:23 PM Tomas Vondra <
> tomas.von...@2ndquadrant.com>
> >> I think it would be beneficial to explain why shared object is more
> >> secure than an OS command. Perhaps it's common knowledge, but it's
> not
> >> quite obvious to me.
> >>
> >>
> >> Yeah, that probably wouldn't hurt. It's also securely passing from more
> than
> >> one perspective -- both from the "cannot be eavesdropped" (like putting
> the
> >> password on the commandline for example) and the requirement for
> escaping.
> > I think a bigger issue is that if you want to give people the option of
> > using a shell command or a shared object, and if you use two commands to
> > control it, it isn't clear what happens if both are defined.  By using
> > some character prefix to control if a shared object is used, you can use
> > a single variable and there is no confusion over having two variables
> > and their conflicting behavior.
> >
>
>
> I'm  not sure how that would work in the present instance. The shared
> preloaded module installs a function and defines the params it wants. If
> we somehow unify the params with ssl_passphrase_command that could look
> icky, and the module would have to parse the settings string. That's not
> a problem for the sample module which only needs one param, but it will
> be for other more complex implementations.
>
> I'm quite open to suggestions, but I want things to be tolerably clean.


If someone wants a shell command wrapper, they can load a contrib that
delegates the hook to a shell command. So we can just ship a contrib, which
acts both as test coverage for the feature, and a shell-command-support
wrapper for anyone who desires that.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: checkpointer: PANIC: could not fsync file: No such file or directory

2019-11-21 Thread Craig Ringer
On Thu, 21 Nov 2019 at 09:07, Justin Pryzby  wrote:

> On Tue, Nov 19, 2019 at 07:22:26PM -0600, Justin Pryzby wrote:
> > I was trying to reproduce what was happening:
> > set -x; psql postgres -txc "DROP TABLE IF EXISTS t" -c "CREATE TABLE t(i
> int unique); INSERT INTO t SELECT generate_series(1,99)"; echo
> "begin;SELECT pg_export_snapshot(); SELECT pg_sleep(9)" |psql postgres -At
> >/tmp/snapshot& sleep 3; snap=`sed "1{/BEGIN/d}; q" /tmp/snapshot`;
> PGOPTIONS='-cclient_min_messages=debug' psql postgres -txc "ALTER TABLE t
> ALTER i TYPE bigint" -c CHECKPOINT; pg_dump -d postgres -t t --snap="$snap"
> |head -44;
> >
> > Under v12, with or without the CHECKPOINT command, it fails:
> > |pg_dump: error: query failed: ERROR:  cache lookup failed for index 0
> > But under v9.5.2 (which I found quickly), without CHECKPOINT, it instead
> fails like:
> > |pg_dump: [archiver (db)] query failed: ERROR:  cache lookup failed for
> index 16391
> > With the CHECKPOINT command, 9.5.2 works, but I don't see why it should
> be
> > needed, or why it would behave differently (or if it's related to this
> crash).
>
> Actually, I think that's at least related to documented behavior:
>
> https://www.postgresql.org/docs/12/mvcc-caveats.html
> |Some DDL commands, currently only TRUNCATE and the table-rewriting forms
> of ALTER TABLE, are not MVCC-safe. This means that after the truncation or
> rewrite commits, the table will appear empty to concurrent transactions, if
> they are using a snapshot taken before the DDL command committed.
>
> I don't know why CHECKPOINT allows it to work under 9.5, or if it's even
> related to the PANIC ..


The PANIC is a defense against potential corruptions that can be caused by
some kinds of disk errors. It's likely that we used to just ERROR and
retry, then the retry would succeed without getting upset.

fsync_fname() is supposed to ignore errors for files that cannot be opened.
But that same message may be emitted by a number of other parts of the
code, and it looks like you didn't have log_error_verbosity = verbose so we
don't have file/line info.

The only other place I see that emits that error where a relation path
could be a valid argument is in rewriteheap.c
in logical_end_heap_rewrite(). That calls the vfd layer's FileSync() and
assumes that any failure is a fsync() syscall failure. But FileSync() can
return failure if we fail to reopen the underlying file managed by the vfd
too, per FileAccess().

Would there be a legitimate case where a logical rewrite file mapping could
vanish without that being a problem? If so, we should probably be more
tolerante there.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: PITR on DROP DATABASE, deleting of the database directory despite the recovery_target_time set before.

2019-11-18 Thread Craig Ringer
On Mon, 18 Nov 2019 at 18:48, Nicolas Lutic  wrote:

> Dear Hackers,
>
> After a drop database
>

with FORCE?


> , he tried to recover the data on the last inserted transaction by using
> the recovery_target_time.
> The issue is the database is present in the system catalog but the
> directory was still deleted.
> Here the technical information of the database
> version 11
> default  postgresql.conf except for this options
> wal_level = replica
> archive_mode = on
> archive_command = 'cp %p /tmp/wal_archive/%f '
> log_statement = 'all'
> log_min_messages = debug5
>
>
> The following method was used
>
>- create cluster
>
>
>- create database
>
>
>- create 1 table
>
>
>- create 1 index on 1 column
>
>
>- insert 1 rows
>
>
>- backup with pg_base_backup
>
>
>- insert 2 rows
>
> autocommit?

>
>
>
>- drop database
>
> force?


>
>- Change recovery behaviour in that case to prevent all xact
>operation to perform until COMMIT timestamp is checked against
>recovery_time bound (but it seems to be difficult as state
>
> https://www.postgresql.org/message-id/flat/20141125160629.GC21475%40msg.df7cb.de
>which also identifies the problem and tries to give some solutions.  Maybe
>another way, as a trivial guess (all apologises) is to buffer immediate
>xacts until we have the commit for each and apply the whole buffer xact
>once the timestamp known (and checked agains recovery_target_time value);
>
>
>- The other way to improve this is to update PostgreSQL
>documentation  by specifying that recovery_target_time cannot be used
>in this case. There should be multiple places where it can be stated.
>The best one (if only one) seems to be in
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/config.sgml;h=
>
> <https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=doc/src/sgml/config.sgml;h=f83770350eda5625179526300c652f23ff29c9fe;hb=HEAD#l3400>
>
>
If this only happens when a DB is dropped under load with force, I lean
toward just documenting it as a corner case.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: physical slot xmin dependency on logical slot?

2019-11-18 Thread Craig Ringer
On Tue, 19 Nov 2019 at 05:37, Jeremy Finzel  wrote:

> We had a scenario today that was new to us.  We had a logical replication
> slot that was severely far behind.  Before dropping this logical slot, we
> made a physical point-in-time-recovery snapshot of the system with this
> logical slot.
>
> This logical slot was causing severe catalog bloat.  We proceeded to drop
> the logical slot which was over 12000 WAL segments behind.  The physical
> slot was only a few 100 segments behind and still in place.
>
> But now proceeding to VAC FULL the catalog tables did not recover any
> bloat beyond the now-dropped logical slot.  Eventually to our surprise, we
> found that dropping the physical slot allowed us to recover the bloat.
>
> We saw in forensics after the fact that xmin of the physical slot equaled
> the catalog_xmin of the logical slot.  Is there some dependency here where
> physical slots made of a system retain all transactions of logical slots it
> contains as well?  If so, could someone help us understand this, and is
> there documentation around this?  Is this by design?
>

I expect that you created the replica in a manner that preserved the
logical replication slot on it. You also had hot_standby_feedback enabled.

PostgreSQL standbys send the global xmin and (in Pg10+) catalog_xmin to the
upstream when hot_standby_feedback is enabled. If there's a slot holding
the catalog_xmin on the replica down, that'll be passed on via
hot_standby_feedback to the upstream. On Pg 9.6 or older, or if the replica
isn't using a physical replication slot, the catalog_xmin is treated as a
regular xmin since there's nowhere in PGPROC or PGXACT to track the
separate catalog_xmin. If the standby uses a physical slot, then on pg10+
the catalog_xmin sent by the replica is stored as the catalog_xmin on the
physical slot instead.

Either way, if you have hot_standby_feedback enabled on a standby, that
feedback includes the requirements of any replication slots on the standby.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [bug fix] Produce a crash dump before main() on Windows

2019-11-10 Thread Craig Ringer
On Mon, 23 Jul 2018 at 16:45, Michael Paquier  wrote:

> On Mon, Jul 23, 2018 at 08:16:52AM +, Tsunakawa, Takayuki wrote:
> > I guess that is due to some missing files related to the libraries
> > listed in shared_preload_library.  If so, no, because this patch
> > relates to failure before main().
>
> No, I really mean a library dependency failure.  For example, imagine
> that Postgres is compiled on Windows dynamically, and that it depends on
> libxml2.dll, which is itself compiled dynamically.  Then imagine, in a
> custom build echosystem, that a folk comes in and adds lz support to
> libxml2 on Windows.  If Postgres still consumes libxml2 but does not add
> in its PATH a version of lz, then a backend in need of libxml2 would
> fail to load, causing Postgres to not start properly.  True, painful,
> story.
>

What's super fun about this is the error message. Guess which error Windows
will emit? (Some paraphrasing for exact wording):

* "There was a problem starting C:\Program
Files\PostgreSQL\11\bin\postgres.exe: The specified module could not be
found."

* "There was a problem starting C:\Program
Files\PostgreSQL\11\bin\postgres.exe: module "libxml2.dll" could not be
found."

* "The program can't start because "libxml2.dll" is missing from your
computer."

* "The program "C:\Program Files\PostgreSQL\11\bin\postgres.exe" can't
start because an error was encountered when loading required library
"C:\Program Files\PostgreSQL\11\lib\libxml.dll": libxml.dll requires
"liblz.dll" but it could not be found."

Hint: not the last one.

It'll at best complain about libxml.dll being missing, despite it being
very obviously present.

I had to go hunt around with Dependency Walker to figure out the actual
missing DLL the last time I had to deal with this.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [bug fix] Produce a crash dump before main() on Windows

2019-11-10 Thread Craig Ringer
On Wed, 18 Jul 2018 at 12:10, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> At Wed, 18 Jul 2018 11:12:06 +0800, Craig Ringer 
> wrote in  yqtpidgg...@mail.gmail.com>
> > On 26 February 2018 at 12:06, Tsunakawa, Takayuki <
> > tsunakawa.ta...@jp.fujitsu.com> wrote:
> >
> > > From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> > > > The patch proposed here means that early crashes will invoke WER. If
> > > we're
> > > > going to allow WER we should probably just do so unconditionally.
> > > >
> > > > I'd be in favour of leaving WER on when we find out we're in a
> > > noninteractive
> > > > service too, but that'd be a separate patch for pg11+ only.
> > >
> > > As for PG11+, I agree that we want to always leave WER on.  That is,
> call
> > > SetErrorMode(SEM_FAILCRITICALERRORS) but not specify
> > > SEM_NOGPFAULTERRORBOX.  The problem with the current specification of
> > > PostgreSQL is that the user can only get crash dumps in a fixed folder
> > > $PGDATA\crashdumps.  That location is bad because the crash dumps will
> be
> > > backed up together with the database cluster without the user noticing
> it.
> > > What's worse, the crash dumps are large.  With WER, the user can
> control
> > > the location and size of crash dumps.
> > >
> >
> > Yeah, that's quite old and dates back to when Windows didn't offer much
> if
> > any control over WER in services.
>
> Yeah. If we want to take a crash dump, we cannot have
> auto-restart. Since it is inevitable what we can do for this
> would be adding a new knob for that, which cannot be turned on
> together with restart_after_crash...?


Why?

Mind you, I don't much care about restart_after_crash, I think it's
thoroughly obsolete. Windows has been capable of restarting failed services
forever, and systemd does so too. There's little reason to have postgres
try to do its own self-recovery now, and I prefer to disable it so the
postmaster can cleanly exit and get a fresh new launch.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Allow cluster_name in log_line_prefix

2019-11-10 Thread Craig Ringer
On Sun, 3 Nov 2019 at 07:22, Vik Fearing 
wrote:

> On 31/10/2019 08:47, Fujii Masao wrote:
> > On Mon, Oct 28, 2019 at 1:33 PM Craig Ringer 
> wrote:
> >> Hi folks
> >>
> >> I was recently surprised to notice that log_line_prefix doesn't support
> a cluster_name placeholder. I suggest adding one. If I don't hear
> objections I'll send a patch.
> > If we do this, cluster_name should be included in csvlog?
>
>
> Yes, absolutely.
>

Ok, I can put that together soon then.

I don't think it's too likely that people will shout about it being added
to csvlog. People using csvlog tend to be ingesting and postprocessing
their logs anyway. Plus gzip is really, really good at dealing with
redundancy.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [HACKERS] [PATCH] pageinspect function to decode infomasks

2019-11-10 Thread Craig Ringer
On Thu, 19 Sep 2019 at 10:04, Michael Paquier  wrote:

> On Wed, Sep 18, 2019 at 10:37:27AM +0900, Michael Paquier wrote:
> > I am attaching an updated patch for now that I would like to commit.
> > Are there more comments about the shape of the patch, the name of the
> > columns for the function, etc.?
>
> Okay, I have done an extra round of review, and committed it.


Thanks very much. That'll make life easier when debugging corruptions.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: libpq sslpassword parameter and callback function

2019-11-10 Thread Craig Ringer
On Fri, 1 Nov 2019 at 07:27, Andrew Dunstan 
wrote:

>
> On 10/31/19 6:34 PM, Andrew Dunstan wrote:
> > This time with attachment.
> >
> >
> > On 10/31/19 6:33 PM, Andrew Dunstan wrote:
> >> This patch provides for an sslpassword parameter for libpq, and a hook
> >> that a client can fill in for a callback function to set the password.
> >>
> >>
> >> This provides similar facilities to those already available in the JDBC
> >> driver.
> >>
> >>
> >> There is also a function to fetch the sslpassword from the connection
> >> parameters, in the same way that other settings can be fetched.
> >>
> >>
> >> This is mostly the excellent work of my colleague Craig Ringer, with a
> >> few embellishments from me.
> >>
> >>
> >> Here are his notes:
> >>
> >>
> >> Allow libpq to non-interactively decrypt client certificates that
> >> are stored
> >> encrypted by adding a new "sslpassword" connection option.
> >>
> >> The sslpassword option offers a middle ground between a cleartext
> >> key and
> >> setting up advanced key mangement via openssl engines, PKCS#11, USB
> >> crypto
> >> offload and key escrow, etc.
> >>
> >> Previously use of encrypted client certificate keys only worked if
> >> the user
> >> could enter the key's password interactively on stdin, in response
> >> to openssl's
> >> default prompt callback:
> >>
> >> Enter PEM passhprase:
> >>
> >> That's infesible in many situations, especially things like use from
> >> postgres_fdw.
> >>
> >> This change also allows admins to prevent libpq from ever prompting
> >> for a
> >> password by calling:
> >>
> >> PQsetSSLKeyPassHook(PQdefaultSSLKeyPassHook);
> >>
> >> which is useful since OpenSSL likes to open /dev/tty to prompt for a
> >> password,
> >> so even closing stdin won't stop it blocking if there's no user
> >> input available.
> >> Applications may also override or extend SSL password fetching with
> >> their own
> >> callback.
> >>
> >> There is deliberately no environment variable equivalent for the
> >> sslpassword
> >> option.
> >>
> >>
>
> I should also mention that this patch provides for support for DER
> format certificates and keys.
>
>
Yep, that was a trivial change I rolled into it.

FWIW, this is related to two other patches: the patch to allow passwordless
fdw connections with explicit superuser approval, and the patch to allow
sslkey/sslpassword to be set as user mapping options in postgres_fdw .
Together all three patches make it possible to use SSL client certificates
to manage authentication in postgres_fdw user mappings.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Handy describe_pg_lock function

2019-11-10 Thread Craig Ringer
On Sun, 10 Nov 2019 at 13:42, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2019-11-08 14:49:25 +0800, Craig Ringer wrote:
> >> I recently found the need to pretty-print the contents of pg_locks. So
> >> here's a little helper to do it, for anyone else who happens to have
> that
> >> need. pg_identify_object is far from adequate for the purpose. Reckon I
> >> should turn it into C and submit?
>
> > Yea, I think we need to make it easier for users to understand
> > locking. I kind of wonder whether part of the answer would be to change
> > the details that pg_locks shows, or add a pg_locks_detailed or such
> > (presumably a more detailed version would include walking the dependency
> > graph to at least some degree, and thus more expensive).
>
> I think the actual reason why pg_locks is so bare-bones is that it's
> not supposed to require taking any locks of its own internally.  If,
> for example, we changed the database column so that it requires a lookup
> in pg_database, then the view would stop working if someone had an
> exclusive lock on pg_database --- pretty much exactly the kind of case
> you might wish to be investigating with that view.
>
> I don't have any objection to adding a more user-friendly layer
> to use for normal cases, but I'm hesitant to add any gotchas like
> that into the basic view.
>
>
Yeah.

You can always query pg_catalog.pg_lock_status() directly,  but that's not
really documented. I'd be fine with adding a secondary view.

That reminds me, I've been meaning to submit a decent "find blocking lock
relationships" view for some time too. It's absurd that people still have
to crib half-broken code from the wiki (
https://wiki.postgresql.org/wiki/Lock_Monitoring) to get a vaguely
comprehensible summary of what's waiting for what. We now
have pg_blocking_pids(), which is fantastic, but it's not AFAIK rolled into
any user-friendly view to help users out so they have to roll their own.

Anyone inclined to object to the addition of an official "pg_lock_details"
view with info like in my example function, and a "pg_lock_waiters" or
"pg_locks_blocked" view with info on blocking/blocked-by relationships? I'd
be inclined to add a C level function to help describe the lock subject of
a pg_locks row, then use that in system_views.sql for the "pg_lock_details"
view. Then build a "pg_lock_waiters" view on top of it
using pg_blocking_pids(). Reasonable?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Allow superuser to grant passwordless connection rights on postgres_fdw

2019-11-10 Thread Craig Ringer
On Mon, 4 Nov 2019 at 12:20, Stephen Frost  wrote:

> Greetings,
>
> * Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote:
> > On 11/1/19 12:58 PM, Robert Haas wrote:
> > > On Thu, Oct 31, 2019 at 4:58 PM Andrew Dunstan
> > >  wrote:
> > >> This patch allows the superuser to grant passwordless connection
> rights
> > >> in postgres_fdw user mappings.
> > > This is clearly something that we need, as the current code seems
> > > woefully ignorant of the fact that passwords are not the only
> > > authentication method supported by PostgreSQL, nor even the most
> > > secure.
> > >
> > > But, I do wonder a bit if we ought to think harder about the overall
> > > authentication model for FDW. Like, maybe we'd take a different view
> > > of how to solve this particular piece of the problem if we were
> > > thinking about how FDWs could do LDAP authentication, SSL
> > > authentication, credentials forwarding...
> >
> > I'm certainly open to alternatives.
>
> I've long felt that the way to handle this kind of requirement is to
> have a "trusted remote server" kind of option- where the local server
> authenticates to the remote server as a *server* and then says "this is
> the user on this server, and this is the user that this user wishes to
> be" and the remote server is then able to decide if they accept that, or
> not.
>

The original use case for the patch was to allow FDWs to use SSL/TLS client
certificates. Each user-mapping has its own certificate - there's a
separate patch to allow that. So there's no delegation of trust via
Kerberos etc in that particular case.

I can see value in using Kerberos etc for that too though, as it separates
authorization and authentication in the same manner as most sensible
systems. You can say "user postgres@foo is trusted to vet users so you can
safely hand out tickets for any bar@foo that postgres@foo says is legit".

I would strongly discourage allowing all users on host A to authenticate as
user postgres on host B. But with appropriate user-mappings support, we
could likely support that sort of model for both SSPI and Kerberos.

A necessary prerequisite is that Pg be able to cope with passwordless
user-mappings though. Hence this patch.



>
> To be specific, there would be some kind of 'trust' established between
> the servers and only if there is some kind of server-level
> authentication, eg: dual TLS auth, or dual GSSAPI auth; and then, a
> mapping is defined for that server, which specifies what remote user is
> allowed to log in as what local user.
>
> This would be a server-to-server auth arrangement, and is quite
> different from credential forwarding, or similar.  I am certainly also a
> huge fan of the idea that we support Kerberos/GSSAPI credential
> forwarding / delegation, where a client willingly forwards to the PG
> server a set of credentials which then allow the PG server to
> authenticate as that user to another system (eg: through an FDW to
> another PG server).
>
> Of course, as long as we're talking pie-in-the-sky ideas, I would
> certainly be entirely for supporting both. ;)
>
> Thanks,
>
> Stephen
>


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Add a GUC variable that control logical replication

2019-11-10 Thread Craig Ringer
On Wed, 18 Sep 2019 at 16:39, Quan Zongliang <
zongliang.q...@postgresdata.com> wrote:

>
> Sybase has a feature to turn off replication at the session level: set
> replication = off, which can be temporarily turned off when there is a
> maintenance action on the table. Our users also want this feature.
> I add a new flag bit in xinfo, control it with a session-level variable,
> when set to true, this flag is written when the transaction is
> committed, and when the logic is decoded it abandons the transaction
> like aborted transactions. Since PostgreSQL has two types of
> replication, I call the variable "logical_replication" to avoid
> confusion and default value is true.
>

There's something related to this already. You can set the replication
origin for the transaction to the special value DoNotReplicateId
(replication origin id 65535). This will suppress replication of the
transaction, at least for output plugins that're aware of replication
origins.

This isn't presently exposed to SQL, it's there for the use of logical
replication extensions. It's possible to expose it with a pretty trivial C
function in an extension.

I think it's a bit of a hack TBH, it's something I perpetrated sometime in
the 9.4 series when we needed a way to suppress replication of individual
transactions. It originated out of core, so the original design was
constrained in how it worked, and maybe it would've actually made more
sense to use an xlinfo flag. Probably not worth changing now though.

Be extremely careful though. If you're hiding things from logical
replication you can get all sorts of confusing and exciting results. I very
strongly suggest you make anything like this superuser-only.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: tableam vs. TOAST

2019-11-10 Thread Craig Ringer
On Thu, 7 Nov 2019 at 22:45, Ashutosh Sharma  wrote:

> On Thu, Nov 7, 2019 at 7:35 PM Robert Haas  wrote:
> >
> > On Thu, Nov 7, 2019 at 1:15 AM Ashutosh Sharma 
> wrote:
> > > @Robert, Myself and Prabhat have tried running the test-cases that
> > > caused the checkpointer process to crash earlier multiple times but we
> > > are not able to reproduce it both with and without the patch. However,
> > > from the stack trace shared earlier by Prabhat, it is clear that the
> > > checkpointer process panicked due to fsync failure. But, there is no
> > > further data to know the exact reason for the fsync failure. From the
> > > code of checkpointer process (basically the function to process fsync
> > > requests) it is understood that, the checkpointer process can PANIC
> > > due to one of the following two reasons.
> >
> > Oh, I didn't realize this was a panic due to an fsync() failure when I
> > looked at the stack trace before.  I think it's concerning that
> > fsync() failed on Prabhat's machine, and it would be interesting to
> > know why that happened, but I don't see how this patch could possibly
> > *cause* fsync() to fail, so I think we can say that whatever is
> > happening on his machine is unrelated to this patch -- and probably
> > also unrelated to PostgreSQL.
> >
>
> That's right and that's exactly what I mentioned in my conclusion too.
>
>
In fact, I suspect this is PostgreSQL successfully protecting itself from
an unsafe situation.

Does the host have thin-provisioned storage? lvmthin, thin-provisioned SAN,
etc?

Is the DB on NFS?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Handy describe_pg_lock function

2019-11-07 Thread Craig Ringer
Hi all

I recently found the need to pretty-print the contents of pg_locks. So
here's a little helper to do it, for anyone else who happens to have that
need. pg_identify_object is far from adequate for the purpose. Reckon I
should turn it into C and submit?

  CREATE FUNCTION describe_pg_lock(IN l pg_locks,
OUT lock_objtype text, OUT lock_objschema text,
OUT lock_objname text, OUT lock_objidentity text,
OUT lock_objdescription text)
  LANGUAGE sql VOLATILE RETURNS NULL ON NULL INPUT AS
  $$
  SELECT
*,
CASE
  WHEN l.locktype IN ('relation', 'extend') THEN
'relation ' || lo.lock_objidentity
  WHEN l.locktype = 'page' THEN
'relation ' || lo.lock_objidentity || ' page ' || l.page
  WHEN l.locktype = 'tuple' THEN
'relation ' || lo.lock_objidentity || ' page ' || l.page || ' tuple
' || l.tuple
  WHEN l.locktype = 'transactionid' THEN
'transactionid ' || l.transactionid
  WHEN l.locktype = 'virtualxid' THEN
'virtualxid ' || l.virtualxid
  WHEN l.locktype = 'speculative token' THEN
'speculative token'
  WHEN lock_objidentity IS NOT NULL THEN
l.locktype || ' ' || lo.lock_objidentity
  ELSE
l.locktype
END
  FROM (
SELECT *
FROM pg_identify_object('pg_class'::regclass, l.relation, 0)
WHERE l.locktype IN ('relation', 'extend', 'page', 'tuple')
UNION ALL
SELECT *
FROM pg_identify_object(l.classid, l.objid, l.objsubid)
WHERE l.locktype NOT IN ('relation', 'extend', 'page', 'tuple')
  ) AS lo(lock_objtype, lock_objschema, lock_objname, lock_objidentity);
  $$;


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: TestLib::command_fails_like enhancement

2019-11-07 Thread Craig Ringer
On Fri, 8 Nov 2019 at 06:28, Mark Dilger  wrote:

>
>
> On 10/31/19 10:02 AM, Andrew Dunstan wrote:
> >
> > This small patch authored by my colleague Craig Ringer enhances
> > Testlib's command_fails_like by allowing the passing of extra keyword
> > type arguments. The keyword initially recognized is 'extra_ipcrun_opts'.
> > The value for this keyword needs to be an array, and is passed through
> > to the call to IPC::Run.
>
> Hi Andrew, a few code review comments:
>
> The POD documentation for this function should be updated to include a
> description of the %kwargs argument list.
>
> Since command_fails_like is patterned on command_like, perhaps you
> should make this change to both of them, even if you only originally
> intend to use the new functionality in command_fails_like.  I'm not sure
> of this, though, since I haven't seen any example usage yet.
>
> I'm vaguely bothered by having %kwargs gobble up the remaining function
> arguments, not because it isn't a perl-ish thing to do, but because none
> of the other functions in this module do anything similar.  The function
> check_mode_recursive takes an optional $ignore_list array reference as
> its last argument.  Perhaps command_fails_like could follow that pattern
> by taking an optional $kwargs hash reference.
>

Yeah, that's probably sensible.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [Patch proposal] libpq portal support

2019-11-07 Thread Craig Ringer
On Thu, 7 Nov 2019 at 17:43, Sergei Fedorov 
wrote:

> Hello everybody,
>
> Yes, we will be happy to put our patch under the PostgreSQL License.
>
> Patch is attached to this email, master was rebased to head prior to
> creating the patch.
>
> We are using a C++ wrapper on top of libpq for using database connections
> in multithreaded asynchronous applications. For security reasons (and
> partially because we are too lazy to escape query parameters) we use
> prepared queries and parameter binding for execution. There are situations
> when we need to fetch the query results not in one batch but in a `paged`
> way, the most convenient way is to use the portals feature of PosgreSQL
> protocol.
>
>
By way of initial patch review: there's a lot of copy/paste here that
should be avoided if possible. It looks like the added function
PQsendPortalBindParams() is heavily based on PQsendQueryGuts(), which is
the common implementation shared by the existing PQsendQueryParams()
and PQsendQueryPrepared() .

Similar for PQsendPortalExecute().

I'd like to see the common code factored out, perhaps by adding the needed
functionality into PQsendQueryGuts() etc.

The patch is also missing documentation; please add it
to doc/src/sgml/libpq.sgml in docbook XML format. See the existing function
examples.

I'd ask you to add test cover, but we don't really have a useful test suite
for libpq yet, so there's not much you can do there. It definitely won't
fly without the docs and copy/paste reduction though.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [Patch proposal] libpq portal support

2019-11-07 Thread Craig Ringer
On Thu, 7 Nov 2019 at 17:43, Sergei Fedorov 
wrote:

> Hello everybody,
>
> Yes, we will be happy to put our patch under the PostgreSQL License.
>
> Patch is attached to this email, master was rebased to head prior to
> creating the patch.
>
> We are using a C++ wrapper on top of libpq for using database connections
> in multithreaded asynchronous applications. For security reasons (and
> partially because we are too lazy to escape query parameters) we use
> prepared queries and parameter binding for execution. There are situations
> when we need to fetch the query results not in one batch but in a `paged`
> way, the most convenient way is to use the portals feature of PosgreSQL
> protocol.
>
>
>
Thanks. That's a really good reason. It'd also bring libpq closer to
feature-parity with PgJDBC.

Please add it to the commitfest app https://commitfest.postgresql.org/


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: TRACE_SORT defined by default

2019-10-29 Thread Craig Ringer
On Fri, 26 Apr 2019 at 05:49, Tomas Vondra  wrote:
>
> On Wed, Apr 24, 2019 at 06:04:41PM -0400, Tom Lane wrote:
> >Peter Geoghegan  writes:
> >> On Wed, Apr 24, 2019 at 2:15 PM Joe Conway  wrote:
> >>> Has anyone ever (or recently) measured the impact on performance to have
> >>> this enabled? Is it that generically useful for debugging of production
> >>> instances of Postgres that we really want it always enabled despite the
> >>> performance impact?
> >
> >> It is disabled by default, in the sense that the trace_sort GUC
> >> defaults to off. I believe that the overhead of building in the
> >> instrumentation without enabling it is indistinguishable from zero.
> >
> >It would probably be useful to actually prove that rather than just
> >assuming it.  I do see some code under the symbol that is executed
> >even when !trace_sort, and in any case Andres keeps complaining that
> >even always-taken branches are expensive ...
> >
>
> Did I hear the magical word "benchmark" over here?
>
> I suppose it'd be useful to have some actual numbers showing what
> overhead this actually has, and whether disabling it would make any
> difference. I can't run anything right away, but I could get us some
> numbers in a couple of days, assuming there is some agreement on which
> cases we need to test.


If you're worried about overheads of dtrace-style probes, you can
(with systemtap ones like we use) generate a set of semaphores as a
separate .so that you link into the final build. Then you can test for
TRACE_POSTGRESQL_FOO_BAR_ENABLED() and only do any work required to
generate input for the trace call if that returns true. You can
generally unlikely() it since you don't care about the cost of it with
tracing enabled nearly as much.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




Re: TRACE_SORT defined by default

2019-10-29 Thread Craig Ringer
On Thu, 25 Apr 2019 at 06:41, Peter Geoghegan  wrote:
>
> On Wed, Apr 24, 2019 at 3:04 PM Tom Lane  wrote:
> > > It is disabled by default, in the sense that the trace_sort GUC
> > > defaults to off. I believe that the overhead of building in the
> > > instrumentation without enabling it is indistinguishable from zero.
> >
> > It would probably be useful to actually prove that rather than just
> > assuming it.
>
> The number of individual trace_sort LOG messages is proportionate to
> the number of runs produced.
>
> > I do see some code under the symbol that is executed
> > even when !trace_sort, and in any case Andres keeps complaining that
> > even always-taken branches are expensive ...
>
> I think that you're referring to the stuff needed for the D-Trace
> probes. It's a pity that there isn't better support for that, since
> Linux has a lot for options around static userspace probes these days
> (SystemTap is very much out of favor, and never was very popular).

Which is a real shame. I got into it last week and I cannot believe
I've wasted time and effort trying to get anything useful out of perf
when it comes to tracing. There's just no comparison.

As usual, the incoming replacement tools are broken, incompatible and
incomplete, especially for userspace. Because really, bah, users, who
cares about users? But yet issues with systemtap are being dismissed
with variants of "that's obsolete, use perf or eBPF-tools". Right,
just like a CnC router can be easily replaced with a chisel.

I expect eBPF-tools will be pretty amazing once it matures. But for
those of us stuck in "working with actual user applications" land, on
RHEL6 and RHEL7 and the like, it won't be doing so in a hurry.

With that said, static probes are useful but frustrating. The
dtrace-alike model SystemTap adopted, and that we use in Pg via
SystemTap's 'dtrace' script, generates quite dtrace-alike static probe
points, complete with the frustrating deficiencies of those probe
points. Most importantly, they don't record the probe argument names
or data types, and if you want to get a string value you need to
handle each string probe argument individually.

Static probes are fantastic as labels and they let you see the program
state in places that are often impractical for debuginfo-based DWARF
probing since they give you a stable, consistent way to see something
other than function args and return values. But they're frustratingly
deficient compared to DWARF-based probes in other ways.

> There seems to be a recognition among the Linux people that the
> distinction between users and backend experts is blurred. The kernel
> support for things like eBPF and BCC is still patchy, but that will
> change.

Just in time for it to be deemed obsolete and replaced with another
tool that only works with the kernel for the first couple of years,
probably. Like perf was.

> > Would any non-wizard really have a use for it?

Extremely strongly yes.

If nothing else you can wrap these tools up into toolsets and scripts
that give people insights into running systems just by running the
script. Non-invasively, without code changes, on an existing running
system.

I wrote a systemtap script script last week that tracks each
transaction in a Pg backend from xid allocation though to
commit/rollback/prepare/commitprepared/rollbackprepared and takes
stats on xid allocations, txn durations, etc. Then figures out if the
committed txn needs logical decoding by any existing slots and tracks
how long each txn takes between commit until a given logical walsender
finishes decoding it and sending it. Collects stats on
inserts/updates/deletes etc in each txn, txn size, etc. Then follows
the txn and observes it being received and applied by a logical
replication client (pglogical). So it can report latencies and
throughputs in every step through the logical replication pipeline.

Right now that script isn't pretty, and it's not something easily
reused. But I could wrap it up in a script that turned on/off parts
based on what a user needed, wrote the stats to csv for
postprocessing, etc.

The underlying tool isn't for end users. But the end result sure can
be. After all, we don't expect users to mess with xact.c and
transam.c, we just expect them to run SQL, but we don't say PostgreSQL
is only for wizards not end users.

For that reason I'm trying to find time to add a large pile more probe
points to PostgreSQL. Informed in part by what I learned writing the
above script. I want probes for WaitEventSetWait, transam activities
(especially xid allocation, commit, rollback), 2pc, walsender
activity, reorderbuffer, walreceiver, slots, global xmin/catalog_xmin
changes, writes, file flushes, buffer access, and lots more. (Pg's
existing probes on transaction start and finish are almost totally
useless as you can't tell if the txn then gets an xi

Allow cluster_name in log_line_prefix

2019-10-27 Thread Craig Ringer
Hi folks

I was recently surprised to notice that log_line_prefix doesn't support a
cluster_name placeholder. I suggest adding one. If I don't hear objections
I'll send a patch.

Before anyone asks "but why?!":

* A constant (short) string in log_line_prefix is immensely useful when
working with logs from multi-node systems. Whether that's physical
streaming replication, logical replication, Citus, whatever, it doesn't
matter. It's worth paying the small storage price for sanity when looking
at logs.

* Yes you can embed it directly into log_line_prefix. But then it gets
copied by pg_basebackup or whatever you're using to clone standbys etc, so
you can easily land up with multiple instances reporting the same name.
This rather defeats the purpose.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [Patch proposal] libpq portal support

2019-10-18 Thread Craig Ringer
On Thu, 17 Oct 2019 at 03:12, Sergei Fedorov 
wrote:

> Hello everybody,
>
> Our company was in desperate need of portals in async interface of libpq,
> so we patched it.
>
> We would be happy to upstream the changes.
>
> The description of changes:
>
> Two functions in libpq-fe.h:
> PQsendPortalBindParams for sending a command to bind a portal to a
> previously prepared statement;
> PQsendPortalExecute for executing a previously bound portal with a given
> number of rows.
>
> A patch to pqParseInput3 in fe-protocol3.c to handle the `portal
> suspended` message tag.
>
> The patch is ready for review, but it lacks documentation, tests and usage
> examples.
>
> There are no functions for sending bind without params and no functions
> for sync interface, but they can easily be added to the feature.
>

If you are happy to put it under The PostgreSQL License, then sending it as
an attachment here is the first step.

If possible, please rebase it on top of git master.

Some explanation for why you have this need and what problems this solves
for you would be helpful as well.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-10-18 Thread Craig Ringer
On Thu, 17 Oct 2019 at 21:19, Alvaro Herrera 
wrote:

> On 2019-Sep-26, Alvaro Herrera wrote:
>
> > On 2019-Sep-26, Jeff Janes wrote:
>
> > > Hi Alvaro, does this count as a review?
> >
> > Well, I'm already a second pair of eyes for Craig's code, so I think it
> > does :-)  I would have liked confirmation from Craig that my change
> > looks okay to him too, but maybe we'll have to go without that.
>
> There not being a third pair of eyes, I have pushed this.
>
> Thanks!
>
>
> Thanks.

I'm struggling to keep up with my own threads right now...


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-10-13 Thread Craig Ringer
On Sun, 13 Oct 2019 at 19:50, Amit Kapila  wrote:

> On Sun, Oct 13, 2019 at 12:25 PM Dilip Kumar 
> wrote:
> >
> > On Thu, Oct 3, 2019 at 2:43 PM Amit Kapila 
> wrote:
> > >
> > > 3.
> > > + *
> > > + * While updating the existing change with detoasted tuple data, we
> need to
> > > + * update the memory accounting info, because the change size will
> differ.
> > > + * Otherwise the accounting may get out of sync, triggering
> serialization
> > > + * at unexpected times.
> > > + *
> > > + * We simply subtract size of the change before rejiggering the
> tuple, and
> > > + * then adding the new size. This makes it look like the change was
> removed
> > > + * and then added back, except it only tweaks the accounting info.
> > > + *
> > > + * In particular it can't trigger serialization, which would be
> pointless
> > > + * anyway as it happens during commit processing right before handing
> > > + * the change to the output plugin.
> > >   */
> > >  static void
> > >  ReorderBufferToastReplace(ReorderBuffer *rb, ReorderBufferTXN *txn,
> > > @@ -3023,6 +3281,13 @@ ReorderBufferToastReplace(ReorderBuffer *rb,
> ReorderBufferTXN *txn,
> > >   if (txn->toast_hash == NULL)
> > >   return;
> > >
> > > + /*
> > > + * We're going modify the size of the change, so to make sure the
> > > + * accounting is correct we'll make it look like we're removing the
> > > + * change now (with the old size), and then re-add it at the end.
> > > + */
> > > + ReorderBufferChangeMemoryUpdate(rb, change, false);
> > >
> > > It is not very clear why this change is required.  Basically, this is
> done at commit time after which actually we shouldn't attempt to spill
> these changes.  This is mentioned in comments as well, but it is not clear
> if that is the case, then how and when accounting can create a problem.  If
> possible, can you explain it with an example?
> > >
> > IIUC, we are keeping the track of the memory in ReorderBuffer which is
> > common across the transactions.  So even if this transaction is
> > committing and will not spill to dis but we need to keep the memory
> > accounting correct for the future changes in other transactions.
> >
>
> You are right.  I somehow missed that we need to keep the size
> computation in sync even during commit for other in-progress
> transactions in the ReorderBuffer.  You can ignore this point or maybe
> slightly adjust the comment to make it explicit.


Does anyone object if we add the reorder buffer total size & in-memory size
to struct WalSnd too, so we can report it in pg_stat_replication?

I can follow up with a patch to add on top of this one if you think it's
reasonable. I'll also take the opportunity to add a number of tracepoints
across the walsender and logical decoding, since right now it's very opaque
in production systems ... and everyone just LOVES hunting down debug syms
and attaching gdb to production DBs.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)

2019-10-10 Thread Craig Ringer
On Thu, 10 Oct 2019 at 23:45, Andres Freund  wrote:

>
> Unless schema qualified you can't rely on that even without search_path
> changing. Consider an object in schema b existing, with a search_path of
> a,b. Even without the search path changing, somebody could create a type
> in a, "masking" the one in b.


True. I remember dealing with (or trying to deal with) that mess in BDR's
DDL replication.

I guess the question becomes "what is required to to permit apps, clients
or drivers to safely and efficiently cache mappings of object names to
server side metadata".

E.g. if I have a non-schema-qualified relation name what server side help
is needed to make it possible to safely cache its column names, column
order, and column types? Or for a given function name, cache whether it's a
function or procedure, its IN, INOUT and OUT parameter names, positions and
types, etc?

You sensibly point out that being notified of search_path changes is not
sufficient. It might still be useful.

We have comprehensive server-side cache invalidation support. I wonder if
it's reasonable to harness that. Let clients request that we accumulate
invalidations and deliver them on the wire to the client at the end of each
statement, before ReadyForQuery, along with a backend-local invalidation
epoch that increments each time we send invalidations to the client.

The problem is that the client would have to send the invalidation epoch
along with each query so the server could tell it to flush its cache and
retry if there were invalidations since the end of the last query.

None of that is likely to be pretty given that we've been unable to agree
on any way to extend the protocol at client request.

Anyone have any ideas for "near enough is good enough" approaches or
methods that'd work with some client assistance / extension support / etc?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Minimal logical decoding on standbys

2019-10-09 Thread Craig Ringer
On Tue, 1 Oct 2019 at 02:08, Robert Haas  wrote:

>
> Why does create_logical_slot_on_standby include sleep(1)?


Yeah, we really need to avoid sleeps in regression tests.

If you need to wait, use a DO block that polls the required condition, and
wrap the sleep in that with a much longer total timeout. In BDR and
pglogical's pg_regress tests I've started to use a shared prelude that sets
a bunch of psql variables that I use as helpers for this sort of thing, so
I can just write :wait_slot_ready instead of repeating the same SQL command
a pile of times across the tests.

That reminds me: I'm trying to find the time to write a couple of patches
to pg_regress to help make life easier too:

- Prelude and postscript .psql files that run before/after every test step
to set variables, do cleanup etc

- Test header comment that can be read by pg_regress to set a per-test
timeout

- Allow pg_regress to time out individual tests and continue with the next
test

- Test result postprocessing by script, where pg_regress writes the raw
test results then postprocesses it with a script before diffing the
postprocessed output. This would allow us to have things like /*
BEGIN_TESTIGNORE */ ... /* END_TESTIGNORE */ blocks for diagnostic output
that we want available but don't want to be part of actual test output. Or
filter out NOTICEs that vary in output. That sort of thing.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Minimal logical decoding on standbys

2019-10-09 Thread Craig Ringer
On Sat, 6 Apr 2019 at 07:15, Andres Freund  wrote:

> Hi,
>
> Thanks for the new version of the patch.  Btw, could you add Craig as a
> co-author in the commit message of the next version of the patch? Don't
> want to forget him.
>

That's kind, but OTOH you've picked it up and done most of the work by now.

I'm swamped with extension related work and have been able to dedicate
frustratingly little time to -hackers and the many patches I'd like to be
working on.


Re: Shared memory

2019-10-09 Thread Craig Ringer
On Fri, 4 Oct 2019 at 17:51, Natarajan R  wrote:

> Why postgres not providing freeing shared memory?
>

It does.

You are presumably looking at static shared memory segments which are
assigned at server start. Most extensions need to use one of these, of
fixed size, for housekeeping and things like storing LWLocks .

But extensions can use DSM or DSA for most of their memory use, giving them
flexible storage.

PostgreSQL doesn't have a fully dynamic shared heap with
malloc()/free()-like semantics. AFAIK we don't support the use of AllocSet
MemoryContext s on DSM-backed shared memory. That might be nice, but it's
complicated.

Most extensions use coding patterns like variable length arrays instead.

I strongly advise you to go read the pglogical source code. It demonstrates
a lot of the things you will need to understand to write complex extensions
that touch multiple databases.

PostgreSQL itself could make this a LOT easier though. pglogical's
basic worker and connection management would be considerably simpler if:

- Extensions could add shared catalog relations
- we could access, copy, and struct-embed BackgroundWorkerHandle
- registered bgworkers weren't relaunched after Pg crash-restart (as
discussed on prior threads)
- the bgworker registration struct had more than uint32 worth of room for
arguments
- extensions could request that Pg maintain extra shmem space for each
bgworker for the extension's use (so we didn't have to maintain our own
parallel array of shmem entries for each worker and coordinate that with
the bgworker's own areas)

I guess I should probably pony up with a patch for some of this

While I'm at it, there are so many other extension points I wish for, with
the high points being:

- Extension-defined syscaches
- Extension-defined locktypes/methods/etc
- More extensible pg_depend
- Hooks in table rewrite

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: pg_init

2019-10-09 Thread Craig Ringer
On Wed, 9 Oct 2019 at 00:33, Natarajan R  wrote:

> I want to read pg_database from pg_init...
>
> Is using heap_open() is possible? or else any other way is there ?
>

It's not possible from _PG_init .

I replied to a similar thread with details on how bgworkers can access
different databases; look at the archives.

The gist is that you have to register a bgworker that attaches to shared
memory and to a database (or use InvalidOid if you only want shared catalog
access), then do your work from there.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Transparent Data Encryption (TDE) and encrypted files

2019-10-09 Thread Craig Ringer
On Wed, 9 Oct 2019 at 22:30, Stephen Frost  wrote:

>
> - All decryption happens in a given backend when it's sending data to
>   the client
>

That is not what I think of as TDE. But upon review, it looks like I'm
wrong, and the usual usage of TDE is for server-side-only encryption
at-rest.

But when I'm asked about TDE, people are generally actually asking for data
that's encrypted at rest and in transit, where the client driver is
responsible for data encryption/decryption transparently to the
application. The server is expected to be able to mark columns as
encrypted, so it can report the column's true datatype while storing a
bytea-like encrypted value for it instead. In this case the server does not
know the column encryption/decryption key at all, and it cannot perform any
operations on the data except for input and output.

Some people ask for indexable encrypted columns, but I tend to explain to
them how impractical and inefficient that is. You can support hash indexes
if you don't salt the encrypted data, but that greatly weakens the
encryption by allowing attackers to use dictionary attacks and other brute
force techniques efficiently. And you can't support b-tree > and < without
very complex encryption schemes (
https://en.wikipedia.org/wiki/Homomorphic_encryption).

I see quite a lot of demand for this column level driver-assisted
encryption. I think it'd actually be quite simple for the PostgreSQL server
to provide support for it too, since most of the work is done by the
driver. But I won't go into the design here since this thread appears to be
about encryption at rest only, fully server-side.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: TCP Wrappers

2019-10-09 Thread Craig Ringer
On Thu, 10 Oct 2019 at 07:15, Tom Lane  wrote:

>
> That doesn't bode well for the number of people who would use or care
> about such a feature.
>

Agreed.  tcp_wrappers predates the widespread availability of easy,
effective software firewalls. Back when services listened on 0.0.0.0 and if
you were lucky you had ipfwadm, tcp_wrappers made a lot of sense. Now it's
IMO a pointless layer of additional complexity that no longer serves a
purpose.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)

2019-10-08 Thread Craig Ringer
On Thu, 11 Jul 2019 at 04:34, Tom Lane  wrote:

> Robert Haas  writes:
> > On Wed, Jul 10, 2019 at 9:59 AM Dave Cramer  wrote:
> >> I'm still a bit conflicted about what to do with search_path as I do
> believe this is potentially a security issue.
> >> It may be that we always want to report that and possibly back patch it.
>
> > I don't see that as a feasible option unless we make the logic that
> > does the reporting smarter.  If it changes transiently inside of a
> > security-definer function, and then changes back, my recollection is
> > that right now we would report both changes.  I think that could cause
> > a serious efficiency problem if you are calling such a function in a
> > loop.
>
> And, even more to the point, what's the client side going to do with
> the information?  If there was a security problem inside the
> security-definer function, it's too late.  And the client can't do
> much about it anyway.
>

Yep. For search_path I definitely agree.

In other places I've (ab)used GUC_REPORT to push information back to the
client as a workaround for the lack of protocol extensibility / custom
messages. It's a little less ugly than abusing NOTICE messages. I'd prefer
a real way to add optional/extension messages, but in the absence of that
extension-defined GUCs may have good reasons to want to report multiple
changes within a single statement/function/etc.

With that said, most of the time I think we could argue that it's
reasonable to fire a GUC_REPORT if the *top-level* GUC nestlevel values
change. That'd solve the search_path spam issue while still giving Dave
what he wants, amongst other things.

BTW, a good argument for the client wanting to be notified of search_path
changes might be for clients that want to cache name=>oid mappings for
types, relations, etc. The JDBC driver would be able to benefit from that
if it could trust that the same name would map to the same type (for a
top-level statement) in future, but currently it can't.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: let's make the list of reportable GUCs configurable (was Re: Add %r substitution for psql prompts to show recovery status)

2019-10-08 Thread Craig Ringer
On Fri, 12 Jul 2019 at 01:27, Robert Haas  wrote:

>
> We have steadfastly refused to provide protocol-level tools for things
> like "please change my user ID, and don't let anyone change it again
> via SQL," and that's a huge problem for things like connection poolers
> which can't parse all the SQL flowing through the connection (because
> figuring out what it does requires solving the Halting Problem) and
> wouldn't want to if they could for performance reasons. I think that's
> a huge mistake.


I very strongly agree. The inability to limit SET and RESET of SESSION
AUTHORIZATION and ROLE is a huge pain point and it's far from the only one.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Regarding extension

2019-10-08 Thread Craig Ringer
On Fri, 4 Oct 2019 at 13:18, Tom Lane  wrote:

> Natarajan R  writes:
> > Me:  Thanks Tomas, But this is for that particular database only, I want
> > to get the *list of database Id's* on which my extension is installed
> > during *PG_INIT* itself...
>
> You can't.  In the first place, that information simply isn't obtainable,
> because a session running within one database doesn't have access to the
> catalogs of other databases in the cluster.  (You could perhaps imagine
> firing up connections to other DBs a la dblink/postgres_fdw, but that will
> fail because you won't necessarily have permissions to connect to every
> database.)  In the second place, it's a pretty terrible design to be
> attempting any sort of database access within _PG_init, because that
> precludes loading that module outside a transaction; for example you
> will not be able to preload it via shared_preload_libraries or allied
> features.
>

Absolutely agreed. Having done this myself, it's much, much harder than
you'd expect and not something I suggest anyone try unless it's absolutely
necessary.

It'd be an absolute dream if extensions could create their own shared
catalogs; that'd make life *so* much easier. But I seem to recall looking
at that and nope-ing right out. That was a while ago so I should probably
revisit it.

Anyhow: BDR and pglogical are extensions that do need to concern itself
with what's in various databases, so this is an issue I've worked with day
to day for some time.

BDR1 used a custom security label and the pg_shseclabel catalog to mark
databases that were BDR-enabled. It launched a worker that connected to
database InvalidOid, so it could read only the global shared catalogs, then
it scanned them to find out which DBs to launch individual workers for.
This interacted poorly with pg_dump/pg_restore and proved fragile, so I
don't recommend it.

pglogical instead launches a static bgworker with no DB connections. On
startup or when it gets a suitable message over its extension shmem
segment + a latch set, it launches new workers for each DB. Each worker
inspects the DB to check for the presence of the pglogical extension and
exits if it isn't found.

All in all, it's pretty clumsy, though it works very well.

We have to do our own process management and registration. Workarounds must
be put in place for processes failing to launch then a new process taking
their shmem slot and various other things. pglogical lands up having to
duplicate quite a bit of the bgw and postmaster management infrastructure
because it's not extensible and it has some serious deficiencies in
error/crash handling.

(We also have our own dependency management, lock management, shared cache
invalidations, syscache/catcache-like mechanism, and other areas where we'd
rather extend Pg's infrastructure but can't. Being able to create our own
dependency types, custom lock types/methods, custom syscaches we could get
invalidations for, etc would just be amazing. But each would likely be a
major effort to get into core, if we could get it accepted at all given the
"in core users" argument, and we'd have to keep the old method around
anyway...)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: The serial pseudotypes

2019-08-25 Thread Craig Ringer
On Mon, 26 Aug 2019 at 01:42, Tom Lane  wrote:

> Vik Fearing  writes:
> > On 25/08/2019 18:59, Tom Lane wrote:
> >> Vik Fearing  writes:
> >>> Is there a reason why the serial pseudotypes still behave as they did
> >>> pre-v10 and don't map to GENERATED BY DEFAULT AS IDENTITY these days?
>
> >> Backwards compatibility?
>
> > With what?
>
> Applications that expect declaring a serial column to result in the same
> catalog side-effects as before.  The default expressions look different,
> and the dependencies look different.  For instance, an app that expected
> atthasdef to tell it something about what happens when a column's value
> is omitted would be surprised.  An app that thought it could alter the
> default expression for a column originally declared serial would be even
> more surprised.
>

Right. I'd be very leery of changing this w/o a lot of checking and/or a
some BC to help apps that expect traditional SERIALs to work.

Hibernate is a rather widely used example. Not a good example mind you, but
a very widely used one.

Its PostgreSQL dialect directly calls nextval(...) and expects the value
returned to be the next value off the sequence based on the sequence's
increment counter. This is true of many other ORMs etc too. To reduce
round-trips and to give them control of when they flush dirty objects from
their internal writeback cache to the DB, they tend to preallocate object
IDs from an internal pool they populate periodically from the database
sequence generator. This design isn't even totally unreasonable given the
access patterns of these tools.

Hibernate also has an existing (IMO awful) defect of defaulting to a
sequence increment value is 50 in its schema configuration/generation
tools. It fails to check the increment on pre-existing schemas and blindly
assumes 50 so if nextval returns 500 it will merrily return values
(450..500] from its internal sequence value assignment pool. Much mess
ensues. So lets not pretend it is good, it's kind of horrible, but like
ActiveRecord and other monstrosities it's also a very widely used product
we need to care about. Unfortunately ;)

This reminds me though, I want to revisit my nextval(regclass, n_ids) patch
I have lying around somewhere. Apps should be able to request non-default
chunk allocations from nextval so we can avoid dealing with these
unpleasant assumptions about increment size...

... or worse, the apps that try to "fix" this by calling nextval then
setval to jump the sequence to the value they think it should have next.
And yes, I've seen this. In production code.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Global temporary tables

2019-08-16 Thread Craig Ringer
On Fri, 16 Aug 2019 at 15:30, Konstantin Knizhnik 
wrote:


>
> 1. Statistic for global temporary tables (including number of tuples,
> pages and all visible flag).
> My position is the following: while in most cases it should not be a
> problem, because users rarely create indexes or do analyze for temporary
> tables,
> there can be situations when differences in data sets of global temporary
> tables in different backends can really be a problem.
> Unfortunately I can not propose good solution for this problem. It is
> certainly possible to create some private (per-backend) cache for this
> metadata.
> But it seems to requires changes in many places.
>

Yeah. I don't really like just sharing them but it's not that bad either.


> 2. Your concerns about performance penalty of global temp tables accessed
> through shared buffers comparing with local temp tables access through
> local buffers.
> I think that this concern is not  actual any more because there is
> implementation of global temp tables using local buffers.
> But my experiments doesn't show significant difference in access speed of
> shared and local buffers. As far as shared buffers are used to be much
> larger than local buffers,
> there are more chances to hold all temp relation in memory without
> spilling it to the disk. In this case access to global temp table will be
> much faster comparing with access to
> local temp tables.
>

You ignore the costs of evicting non-temporary data from shared_buffers,
i.e. contention for space. Also increased chance of backends being forced
to do direct write-out due to lack of s_b space for dirty buffers.

> In case of pulling all content of temp table in memory (pg_prewarm)
global temp table with shared buffers becomes faster.

Who would ever do that?

I forget or do not notice some of your questions, would you be so kind as
> to repeat them?
>



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Global temporary tables

2019-08-16 Thread Craig Ringer
On Tue, 13 Aug 2019 at 21:50, Konstantin Knizhnik 
wrote:

> As far as I understand relpages and reltuples are set only when you
>> perform "analyze" of the table.
>>
>
> Also autovacuum's autoanalyze.
>
>
> When it happen?
> I have created normal table, populated it with some data and then wait
> several hours but pg_class was not updated for this table.
>
>

heap_vacuum_rel() in src/backend/access/heap/vacuumlazy.c below

 * Update statistics in pg_class.

which I'm pretty sure is common to explicit vacuum and autovacuum. I
haven't run up a test to verify 100% but most DBs would never have relpages
etc set if autovac didn't do it since most aren't explicitly VACUUMed at
all.

I thought it was done when autovac ran an analyze, but it looks like it's
all autovac. Try setting very aggressive autovac thresholds and inserting +
deleting a bunch of tuples maybe.

I attach to this mail slightly refactored versions of this patches with
> fixes of issues reported in your review.
>

Thanks.

Did you have a chance to consider my questions too? I see a couple of
things where there's no patch change, which is fine, but I'd be interested
in your thoughts on the question/issue in those cases.

>

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Configuring bgw_restart_time

2019-08-13 Thread Craig Ringer
On Tue, 13 Aug 2019 at 20:37, Jeremy Finzel  wrote:

> I am working on an extension that uses background workers, and interested
> in adding code for it to auto-restart after a crash or server restart
> (similar to as-coded in worker_spi).
>

What pglogical does for this is use dynamic background workers with restart
turned off. It does its own worker exit and restart handling from a manager
worker that's an always-running static bgworker.

It's not ideal as it involves a considerable amount of extra work, but with
BDR we rapidly found that letting postgres itself restart bgworkers was
much too inflexible and hard to control. Especially given the issues around
soft-crash restarts and worker registration (see thread
https://www.postgresql.org/message-id/flat/534E6569.1080506%402ndquadrant.com)
.

But I'm also interested in being able to configure bgw_restart_time using a
> GUC without having to restart the server, using only SIGHUP.  For example,
> I want to normally have the worker restart after 10 seconds.  But if I am
> doing maintenance on the server (without a db restart), I perhaps want to
> change this to -1 (BGW_NEVER_RESTART), kill the worker, do my business,
> then restart the worker.
>

Instead of doing that I suggest having a SQL-callable function that sets a
flag in a shared memory segment used by the worker then sets the worker's
ProcLatch to wake it up if it's sleeping. The flag can *ask* it to exit
cleanly. If its exit code is 0 it will not be restarted.

You could also choose to have the worker exit with code 0 on SIGTERM, again
causing itself to be unregistered and not restarted.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Global temporary tables

2019-08-13 Thread Craig Ringer
On Tue, 13 Aug 2019 at 16:19, Konstantin Knizhnik 
wrote:

>
>
> On 13.08.2019 8:34, Craig Ringer wrote:
>
> On Tue, 13 Aug 2019 at 00:47, Pavel Stehule 
> wrote:
>
>
>> But Postgres is not storing this information now anywhere else except
>>> statistic, isn't it?
>>>
>>
>> not only - critical numbers are reltuples, relpages from pg_class
>>
>
> That's a very good point. relallvisible too. How's the global temp table
> impl handling that right now, since you won't be changing the pg_class row?
> AFAICS relpages doesn't need to be up to date (and reltuples certainly
> doesn't) so presumably you're just leaving them as zero?
>
> As far as I understand relpages and reltuples are set only when you
> perform "analyze" of the table.
>

Also autovacuum's autoanalyze.

What happens right now if you ANALYZE or VACUUM ANALYZE a global temp
> table? Is it just disallowed?
>
>
> No, it is not disallowed now.
> It updates the statistic and also fields in pg_class which are shared by
> all backends.
> So all backends will now build plans according to this statistic.
> Certainly it may lead to not so efficient plans if there are large
> differences in number of tuples stored in this table in different backends.
> But seems to me critical mostly in case of presence of indexes for
> temporary table. And it seems to me that users are created indexes for
> temporary tables even rarely than doing analyze for them.
>

That doesn't seem too bad TBH. Hacky but it doesn't seem dangerously wrong
and as likely to be helpful as not if clearly documented.


> Temporary tables (both local and global) as well as unlogged tables are
> not subject of logical replication, aren't them?
>
>
Right. But in the same way that they're still present in the catalogs, I
think they still affect catalog snapshots maintained by logical decoding's
historic snapshot manager as temp table creation/drop will still AFAIK
cause catalog invalidations to be written on commit. I need to double check
that.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Global temporary tables

2019-08-13 Thread Craig Ringer
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik 
wrote:

>
>
> Ok, here it is: global_private_temp-1.patch
>


Initial pass review follows.

Relation name "SESSION" is odd. I guess you're avoiding "global" because
the data is session-scoped, not globally temporary. But I'm not sure
"session" fits either - after all regular temp tables are also session
temporary tables. I won't focus on naming further beyond asking that it be
consistent though, right now there's a mix of "global" in some places and
"session" in others.


Why do you need to do all this indirection with changing RelFileNode to
RelFileNodeBackend in the bufmgr, changing BufferGetTag etc? Similarly,
your changes of RelFileNodeBackendIsTemp to RelFileNodeBackendIsLocalTemp .
Did you look into my suggestion of extending the relmapper so that global
temp tables would have a relfilenode of 0 like pg_class etc, and use a
backend-local map of oid-to-relfilenode mappings? I'm guessing you did it
the way you did instead to lay the groundwork for cross-backend sharing,
but if so it should IMO be in that patch. Maybe my understanding of the
existing temp table mechanics is just insufficient as I
see RelFileNodeBackendIsTemp is already used in some aspects of existing
temp relation handling.

Similarly, TruncateSessionRelations probably shouldn't need to exist in
this patch in its current form; there's no shared_buffers use to clean and
the same file cleanup mechanism should handle both session-temp and
local-temp relfilenodes.


A number of places make a change like this:

rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP ||
+ rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION

and I'd like to see a test macro or inline static for it since it's
repeated so much. Mostly to make the intent clear: "is this a relation with
temporary backend-scoped data?"


This test:

+ if (blkno == BTREE_METAPAGE && PageIsNew(BufferGetPage(buf)) &&
IsSessionRelationBackendId(rel->rd_backend))
+ _bt_initmetapage(BufferGetPage(buf), P_NONE, 0);

seems sensible but I'm wondering if there's a way to channel initialization
of global-temp objects through a bit more of a common path, so it reads
more obviously as a common test applying to all global-temp tables. "Global
temp table not initialized in session yet? Initialize it." So we don't have
to have every object type do an object type specific test for "am I
actually uninitialized?" in all paths it might hit. I guess I expected to
see something more like a

if (RelGlobalTempUninitialized(rel))

but maybe I've been doing too much Java ;)

A similar test reappears here for sequences:

+ if (rel->rd_rel->relpersistence == RELPERSISTENCE_SESSION &&
PageIsNew(page))

Why is this written differently?


Sequence initialization ignores sequence startval/firstval settings. Why?


- else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+ else if (newrelpersistence != RELPERSISTENCE_TEMP)

Doesn't this change the test outcome for RELPERSISTENCE_UNLOGGED?


In PreCommit_on_commit_actions, in the the ONCOMMIT_DELETE_ROWS case, is
there any way to still respect the XACT_FLAGS_ACCESSEDTEMPNAMESPACE flag if
the oid is for a backend-temp table not a global-temp table?


+ bool isLocalBuf = SmgrIsTemp(smgr) && relpersistence ==
RELPERSISTENCE_TEMP;

Won't session-temp tables have local buffers too? Until your next patch
that adds shared_buffers storage for them anyway?


+ * These is no need to separate them at file system level, so just
subtract SessionRelFirstBackendId
+ * to avoid too long file names.

I agree that there's no reason to be able to differentiate between
local-temp and session-temp relfilenodes at the filesystem level.









> Also I have attached updated version of the global temp tables with shared
> buffers - global_shared_temp-1.patch
> It is certainly larger (~2k lines vs. 1.5k lines) because it is changing
> BufferTag and related functions.
> But I do not think that this different is so critical.
> Still have a wish to kill two birds with one stone:)
>
>
>
>
>
>
>
>
> --
>
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Global temporary tables

2019-08-12 Thread Craig Ringer
On Tue, 13 Aug 2019 at 00:47, Pavel Stehule  wrote:


> But Postgres is not storing this information now anywhere else except
>> statistic, isn't it?
>>
>
> not only - critical numbers are reltuples, relpages from pg_class
>

That's a very good point. relallvisible too. How's the global temp table
impl handling that right now, since you won't be changing the pg_class row?
AFAICS relpages doesn't need to be up to date (and reltuples certainly
doesn't) so presumably you're just leaving them as zero?

What happens right now if you ANALYZE or VACUUM ANALYZE a global temp
table? Is it just disallowed?

I'll need to check, but I wonder if periodically updating those fields in
pg_class impacts logical decoding too. Logical decoding must treat
transactions with catalog changes as special cases where it creates custom
snapshots and does other expensive additional work.
(See ReorderBufferXidSetCatalogChanges in reorderbuffer.c and its
callsites). We don't actually need to know relpages and reltuples during
logical decoding. It can probably ignore relfrozenxid and relminmxid
changes too, maybe even pg_statistic changes though I'd be less confident
about that one.

At some point I need to patch in a bunch of extra tracepoints and do some
perf tracing to see how often we do potentially unnecessary snapshot
related work in logical decoding.


> There was proposal to cache relation size,  but it is not implemented yet.
>> If such cache exists, then we can use it to store local information about
>> global temporary tables.
>> So if 99% of users do not perform analyze for temporary tables, then them
>> will not be faced with this problem, right?
>>
>
> they use default statistics based on relpages. But for 1% of applications
> statistics are critical - almost always for OLAP applications.
>

Agreed. It's actually quite a common solution to user problem reports /
support queries about temp table performance: "Run ANALYZE. Consider
creating indexes too."

Which reminds me - if global temp tables do get added, it'll further
increase the desirability of exposing a feature to let users
disable+invalidate and then later reindex+enable indexes without icky
catalog hacking. So they can disable indexes for their local copy, load
data, re-enable indexes. That'd be "interesting" to implement for global
temp tables given that index state is part of the pg_index row associated
with an index rel though.


1. hold these data only in memory in special buffers
>>
>>
I don't see that working well for pg_statistic or anything else that holds
nontrivial user data though.

> 2. hold these data in global temporary tables - it is similar to normal
>> tables - we can use global temp tables for metadata like classic persistent
>> tables are used for metadata of classic persistent tables. Next syscache
>> can be enhanced to work with union of two system tables.
>>
>>
Very meta. Syscache and relcache are extremely performance critical but
could probably skip scans entirely on backends that haven't used any global
temp tables.

I don't know the relevant caches well enough to have a useful opinion here.

> I think that it not possible to assume that temporary data will aways fir
>> in memory.
>> So 1) seems to be not acceptable solution.
>>
>
It'd only be the metadata, but if it includes things like column histograms
and most frequent value data that'd still be undesirable to have pinned in
backend memory.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Add "password_protocol" connection parameter to libpq

2019-08-09 Thread Craig Ringer
On Fri, 9 Aug 2019 at 11:00, Michael Paquier  wrote:

> On Thu, Aug 08, 2019 at 03:38:20PM -0700, Jeff Davis wrote:
> > Libpq doesn't have a way to control which password protocols are used.
> > For example, the client might expect the server to be using SCRAM, but
> > it actually ends up using plain password authentication instead.
>
> Thanks for working on this!
>
> > I'm not 100% happy with the name "password_protocol", but other names I
> > could think of seemed likely to cause confusion.
>
> What about auth_protocol then?  It seems to me that it could be useful
> to have the restriction on AUTH_REQ_MD5 as well.
>
> > Sets the least-secure password protocol allowable when using password
> > authentication. Options are: "plaintext", "md5", "scram-sha-256", or
> > "scram-sha-256-plus".
>
> This makes it sound like there is a linear hierarchy among all those
> protocols, which is true in this case, but if the list of supported
> protocols is extended in the future it may be not.
>

Before we go too far along with this, lets look at how other established
protocols do things and the flaws that've been discovered in their
approaches. If this isn't done with extreme care then there's a large risk
of negating the benefits offered by adopting recent things like SCRAM.
Frankly I kind of wish we could just use SASL, but there are many (many)
reasons no to.

Off the top of my head I can think of these risks:

* Protocols that allow naïve pre-auth client/server auth negotiation (e.g.
by finding the overlap in exchanged supported auth-mode lists) are subject
to MiTM downgrade attacks where the attacker filters out protocols it
cannot intercept and break from the proposed alternatives.

* Protocols that specify a hierarchy tend to be inflexible and result in
hard to predict auth mode selections as the options grow. If my app wants
GSSAPI or SuperStrongAuth but doesn't accept SSPI, and the hierarchy is
GSSAPI > SSPI > SuperStrongAuth, it has to fall back to a disconnect and
retry model like now.

* Protocols that announce supported auth methods before any kind of trust
is established make life easier for vulnerability scanners and worms

and I'm sure there are more when it comes to auth handshakes.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Global temporary tables

2019-08-09 Thread Craig Ringer
On Fri, 9 Aug 2019 at 22:07, Konstantin Knizhnik 
wrote:

>
>
> Ok, here it is: global_private_temp-1.patch
>

Fantastic.

I'll put that high on my queue.

I'd love to see something like this get in.

Doubly so if it brings us closer to being able to use temp tables on
physical read replicas, though I know there are plenty of other barriers
there (not least of which being temp tables using persistent txns not
vtxids)

Does it have a CF entry?


> Also I have attached updated version of the global temp tables with shared
> buffers - global_shared_temp-1.patch
>

Nice to see that split out. In addition to giving the first patch more hope
of being committed this time around, it'll help with readability and
testability too.

To be clear, I have long wanted to see PostgreSQL have the "session" state
abstraction you have implemented. I think it's really important for high
client count OLTP workloads, working with the endless collection of ORMs
out there, etc. So I'm all in favour of it in principle so long as it can
be made to work reliably with limited performance impact on existing
workloads and without making life lots harder when adding new core
functionality, for extension authors etc. The same goes for built-in
pooling. I think PostgreSQL has needed some sort of separation of
"connection", "backend", "session" and "executor" for a long time and I'm
glad to see you working on it.

With that said: How do you intend to address the likelihood that this will
cause performance regressions for existing workloads that use temp tables
*without* relying on your session state and connection pooler? Consider
workloads that use temp tables for mid-long txns where txn pooling is
unimportant, where they also do plenty of read and write activity on
persistent tables. Classic OLAP/DW stuff. e.g.:

* four clients, four backends, four connections, session-level connections
that stay busy with minimal client sleeps
* All sessions run the same bench code
* transactions all read plenty of data from a medium to large persistent
table (think fact tables, etc)
* transactions store a filtered, joined dataset with some pre-computed
window results or something in temp tables
* benchmark workload makes big-ish temp tables to store intermediate data
for its medium-length transactions
* transactions also write to some persistent relations, say to record their
summarised results

How does it perform with and without your patch? I'm concerned that:

* the extra buffer locking and various IPC may degrade performance of temp
tables
* the temp table data in shared_buffers may put pressure on shared_buffers
space, cached pages for persistent tables all sessions are sharing;
* the temp table data in shared_buffers may put pressure on shared_buffers
space for dirty buffers, forcing writes of persistent tables out earlier
therefore reducing write-combining opportunities;


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Global temporary tables

2019-08-08 Thread Craig Ringer
On Thu, 8 Aug 2019 at 15:03, Konstantin Knizhnik 
wrote:

>
>
> On 08.08.2019 5:40, Craig Ringer wrote:
>
> On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>> New version of the patch with several fixes is attached.
>> Many thanks to Roman Zharkov for testing.
>>
>
> FWIW I still don't understand your argument with regards to using
> shared_buffers for temp tables having connection pooling benefits. Are you
> assuming the presence of some other extension in your extended  version of
> PostgreSQL ? In community PostgreSQL a temp table's contents in one backend
> will not be visible in another backend. So if your connection pooler in
> transaction pooling mode runs txn 1 on backend 42 and it populates temp
> table X, then the pooler runs the same app session's txn 2 on backend 45,
> the contents of temp table X are not visible anymore.
>
>
> Certainly here I mean built-in connection pooler which is not currently
> present in Postgres,
> but it is part of PgPRO-EE and there is my patch for vanilla at commitfest:
> https://commitfest.postgresql.org/24/2067
>

OK, that's what I assumed.

You're trying to treat this change as if it's a given that the other
functionality you want/propose is present in core or will be present in
core. That's far from given. My suggestion is to split it up so that the
parts can be reviewed and committed separately.


> In PgPRO-EE this problem was solved by binding session to backend. I.e.
> one backend can manage multiple sessions,
> but session can not migrate to another backend. The drawback of such
> solution is obvious: one long living transaction can block transactions of
> all other sessions scheduled to this backend.
> Possibility to migrate session to another backend is one of the obvious
> solutions of the problem. But the main show stopper for it is temporary
> tables.
> This is why  I consider moving temporary tables to shared buffers as very
> important step.
>

I can see why it's important for your use case.

I am not disagreeing.

I am however strongly suggesting that your patch has two fairly distinct
functional changes in it, and you should separate them out.

* Introduce global temp tables, a new relkind that works like a temp table
but doesn't require catalog changes. Uses per-backend relfilenode and
cleanup like existing temp tables. You could extend the relmapper to handle
the mapping of relation oid to per-backend relfilenode.

* Associate global temp tables with session state and manage them in
shared_buffers so they can work with the in-core connection pooler (if
committed)

Historically we've had a few efforts to get in-core connection pooling that
haven't gone anywhere. Without your pooler patch the changes you make to
use shared_buffers etc are going to be unhelpful at best, if not actively
harmful to performance, and will add unnecessary complexity. So I think
there's a logical series of patches here:

* global temp table relkind and support for it
* session state separation
* connection pooling
* pooler-friendly temp tables in shared_buffers

Make sense?


> But even if we forget about built-in connection pooler, don't you think
> that possibility to use parallel query plans for temporary tables is itself
> strong enough motivation to access global temp table through shared buffers?
>

I can see a way to share temp tables across parallel query backends being
very useful for DW/OLAP workloads, yes. But I don't know if putting them in
shared_buffers is the right answer for that. We have DSM/DSA, we have
shm_mq, various options for making temp buffers share-able with parallel
worker backends.

I'm suggesting that you not tie the whole (very useful) global temp tables
feature to this, but instead split it up into logical units that can be
understood, reviewed and committed separately.

I would gladly participate in review.

Would DISCARD TEMP_BUFFERS meet your needs?
>
>
> Actually I have already implemented DropLocalBuffers function (three line
> of code:)
>
> [...]
>
I do not think that we need such command at user level (i.e. have
> correspondent SQL command).
>

I'd be very happy to have it personally, but don't think it needs to be
tied in with your patch set here. Maybe I can cook up a patch soon.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: POC: converting Lists into arrays

2019-08-08 Thread Craig Ringer
On Thu, 8 Aug 2019 at 12:18, Andres Freund  wrote:

> Hi,
>
> On 2019-08-08 11:36:44 +0800, Craig Ringer wrote:
> > > you can only put one  into the first element of a
> > > for (;;).
> > >
> >
> > Use an anonymous block outer scope? Or if not permitted even by C99
> (which
> > I think it is), a do {...} while (0);  hack?
>
> You can't easily - the problem is that there's no real way to add the
> closing }, because that's after the macro.


Ah, right. Hence our

PG_TRY();
{
}
PG_CATCH();
{
}
PG_END_TRY();

construct in all its beauty.

I should've seen that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Proposal for Signal Detection Refactoring

2019-08-07 Thread Craig Ringer
On Wed, 6 Mar 2019 at 17:38, Chris Travers  wrote:

>
> Here's a new patch.  No rush on it.  I am moving it to next commitfest
> anyway because as code documentation I think this is a low priority late in
> the release cycle.
>

While you're looking at signal detection changes I suggest making sure you
get them right for the contribs that use explicit signal handling,
like src/test/modules/worker_spi/ .

I'm actually pretty sure worker_spi is incorrect as it stands in the
current codebase. It defines its own worker_spi_sighup
and worker_spi_sigterm handlers. worker_spi_sigterm() sets a static
bool got_sighup that's scoped to worker_spi.c . Importantly it does NOT set
ipc.c's InterruptPending or ProcDiePending so the CHECK_FOR_INTERRUPTS()
macro will not notice when a backend running worker_spi gets a SIGTERM. So
long as worker_spi's own main loop is serviced regularly that's fine, but
it means worker_spi won't react to signals at all if it's busy in a query
or somewhere else in postgres code.

It's also worth paying attention to the walsender, which has its own signal
handling, and pretty much anywhere else that has a pqsignal(...) call that
isn't SIG_IGN, die, quickdie, startup_die or procsignal_sigusr1_handler:

git grep -P 'pqsignal\(SIG(INT|HUP|TERM|QUIT|USR1|USR2),
(?!die|startup_die|quickdie|procsignal_sigusr1_handler|SIG_IGN)'
src/backend/

I actually found a walsender signal handling issue recently, per
https://www.postgresql.org/message-id/CAMsr%2BYEuz4XwZX_QmnX_-2530XhyAmnK%3DzCmicEnq1vLr0aZ-g%40mail.gmail.com
 .

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: POC: converting Lists into arrays

2019-08-07 Thread Craig Ringer
On Thu, 1 Aug 2019 at 07:40, Tom Lane  wrote:

> Andres Freund  writes:
> > Unfortunately foreach(ListCell *lc, ...) doesn't work with the current
> > definition. Which I think isn't great, because the large scopes for loop
> > iteration variables imo makes the code harder to reason about.
>
>
Totally agree.


>
> you can only put one  into the first element of a
> for (;;).
>

Use an anonymous block outer scope? Or if not permitted even by C99 (which
I think it is), a do {...} while (0);  hack?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Global temporary tables

2019-08-07 Thread Craig Ringer
On Tue, 6 Aug 2019 at 16:32, Konstantin Knizhnik 
wrote:

> New version of the patch with several fixes is attached.
> Many thanks to Roman Zharkov for testing.
>

FWIW I still don't understand your argument with regards to using
shared_buffers for temp tables having connection pooling benefits. Are you
assuming the presence of some other extension in your extended  version of
PostgreSQL ? In community PostgreSQL a temp table's contents in one backend
will not be visible in another backend. So if your connection pooler in
transaction pooling mode runs txn 1 on backend 42 and it populates temp
table X, then the pooler runs the same app session's txn 2 on backend 45,
the contents of temp table X are not visible anymore.

Can you explain? Because AFAICS so long as temp table contents are
backend-private there's absolutely no point ever using shared buffers for
their contents.

Perhaps you mean that in a connection pooling case, each backend may land
up filling up temp buffers with contents from *multiple different temp
tables*? If so, sure, I get that, but the answer there seems to be to
improve eviction and memory accounting, not make backends waste precious
shared_buffers space on non-shareable data.

Anyhow, I strongly suggest you simplify the feature to add the basic global
temp table feature so the need to change pg_class, pg_attribute etc to use
temp tables is removed, but separate changes to temp table memory handling
etc into a follow-up patch. That'll make it smaller and easier to review
and merge too. The two changes are IMO logically quite separate anyway.

Come to think of it, I think connection poolers might benefit from an
extension to the DISCARD command, say "DISCARD TEMP_BUFFERS", which evicts
temp table buffers from memory *without* dropping the temp tables. If
they're currently in-memory tuplestores they'd be written out and evicted.
That way a connection pooler could "clean" the backend, at the cost of some
probably pretty cheap buffered writes to the system buffer cache. The
kernel might not even bother to write out the buffercache and it won't be
forced to do so by fsync, checkpoints, etc, nor will the writes go via WAL
so such evictions could be pretty cheap - and if not under lots of memory
pressure the backend could often read the temp table back in from system
buffer cache without disk I/O.

That's my suggestion for how to solve your pooler problem, assuming I've
understood it correctly.

Along these lines I suggest adding the following to DISCARD at some point,
obviously not as part of your patch:

* DISCARD TEMP_BUFFERS
* DISCARD SHARED_BUFFERS
* DISCARD TEMP_FILES
* DISCARD CATALOG_CACHE
* DISCARD HOLD_CURSORS
* DISCARD ADVISORY_LOCKS

where obviously DISCARD SHARED_BUFFERS would be superuser-only and evict
only clean buffers.

(Also, if we extend DISCARD lets also it to be written as DISCARD (LIST,
OF, THINGS, TO, DISCARD) so that we can make the syntax extensible for
plugins in future).

Thoughts?

Would DISCARD TEMP_BUFFERS meet your needs?


Re: Global temporary tables

2019-07-31 Thread Craig Ringer
, session advisory locks and more.

I don't see how global temp tables will do you the slightest bit of good
here as the data in them will still be backend-local. If it isn't then you
should just be using unlogged tables.


> Definition of this table (metadata) is shared by all backends but data
> is private to the backend. After session termination data is obviously
> lost.
>

+1 that's what a global temp table should be, and it's IIRC pretty much how
the SQL standard specifies temp tables.

I suspect I'm overlooking some complexities here, because to me it seems
like we could implement these fairly simply. A new relkind would identify
it as a global temp table and the relfilenode would be 0. Same for indexes
on temp tables. We'd extend the relfilenode mapper to support a
backend-local non-persistent relfilenode map that's used to track temp
table and index relfilenodes. If no relfilenode is defined for the table,
the mapper would allocate one. We already happily create missing
relfilenodes on write so we don't even have to pre-create the actual file.
We'd register the relfilenode as a tempfile and use existing tempfile
cleanup mechanisms, and we'd use the temp tablespace to store it.

I must be missing something important because it doesn't seem hard.

Global temporary tables are accessed though shared buffers (to solve
> problem 2).
>

I'm far from convinced of the wisdom or necessity of that, but I haven't
spent as much time digging into this problem as you have.


> The drawback of such approach is that it will be necessary to
> reimplement large bulk of heapam code.
> But this approach allows to eliminate visibility check for temporary
> table tuples and decrease size of tuple header.
>

That sounds potentially cool, but perhaps a "next step" thing? Allow the
creation of global temp tables to specify reloptions, and you can add it as
a reloption later. You can't actually eliminate visibility checks anyway
because they're still MVCC heaps. Savepoints can create invisible tuples
even if you're using temp tables that are cleared on commit, and of course
so can DELETEs or UPDATEs. So I'm not sure how much use it'd really be in
practice.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: concerns around pg_lsn

2019-07-31 Thread Craig Ringer
On the topic of pg_lsn, I recently noticed that there's no
operator(+)(pg_lsn,bigint) nor is there an operator(-)(pg_lsn,bigint) so
you can't compute offsets easily. We don't have a cast between pg_lsn and
bigint because we don't expose an unsigned bigint type in SQL, so you can't
work around it that way.

I may be missing the obvious, but I suggest (and will follow with a patch
for) adding + and - operators for computing offsets. I was considering an
age() function for it too, but I think it's best to force the user to be
clear about what current LSN they want to compare with so I'll skip that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: tap tests driving the database via psql

2019-07-30 Thread Craig Ringer
On Wed, 31 Jul 2019 at 10:20, Andres Freund  wrote:

> Hi,
>
> On 2019-07-31 09:32:10 +0800, Craig Ringer wrote:
> > OK. So rather than building our own $everything from scratch, lets look
> at
> > solving that.
>
> IDK, a minimal driver that just does what we need it to do is a few
> hundred lines, not more. And there's plenty of stuff that we simply
> won't be able to test with any driver that's not purposefully written
> for testing. There's e.g. basically no way with any of the drivers to
> test intentionally bogus sequences of protocol messages, yet that's
> something we really ought to test.
>
> I've written custom hacks^Wprograms to tests things like that a few
> times. That really shouldn't be necessary.
>
>
That's a good point. I've had to write simple protocol hacks myself, and
having a reusable base tool for it would indeed be valuable.

I'm just worried it'll snowball into Yet Another Driver We Don't Need, or
land up as a half-finished poorly-maintained burden nobody wants to touch.
Though the general fondness for and familiarity with Perl in the core
circle of Pg contributors and committers would probably help here, since
it'd inevitably land up being written in Perl...

I'm interested in evolution of the protocol and ways to enhance it, and I
can see something we can actively use for protocol testing being valuable.
If done right, the protocol implementation could probably pass-through
inspect messages from regular libpq etc as well as serving as either a
client or even as a dumb server simulator for pregenerated responses for
client testing.

We certainly couldn't do anything like that with existing tools and by
reusing existing drivers.

I wonder if there's any way to make writing and maintaining it less painful
though. Having to make everything work with Perl 5.8.8 and no non-core
modules leads to a whole pile of bespoke code and reinvention.


> > Perl modules support local installations. Would it be reasonable to
> require
> > DBI to be present, then rebuild DBD::Pg against our libpq during our own
> > test infra compilation, and run it with a suitable LD_LIBRARY_PATH or
> using
> > rpath? That'd actually get us a side-benefit of some test coverage of
> > DBD::Pg and libpq.
>
> You're intending to download DBD::Pg?


That'd be a reasonable option IMO, yes. Either via git with https, or a
tarball where we check a signature. So long as it can use any pre-existing
local copy without needing to hit the Internet, and the build can also
succeed without the presence of it at all, I think that'd be reasonable
enough.

But I'm probably getting contaminated by excessive exposure to Apache Maven
when I have to work with Java. I'm rapidly getting used to downloading
things being a part of builds. Personally I'd expect that unless
specifically testing new libpq functionality etc, most people would just be
using their system DBD::Pg... and I consider linking the system DBD::Pg
against our new-compiled libpq more feature than bug, as if we retain the
same soname we should be doing a reasonable job of not breaking upward
compatibility anyway. (It'd potentially be an issue if you have a very new
system libpq and are running tests on an old postgres, I guess).


> Or how would you get around the licensing issues? Downloading it will have
> some issues too: For one at least I
> often want to be able to run tests when offline; there's security
> concerns too.
>

Roughly what I was thinking of was:

For Pg source builds (git, or dist tarball), we'd generally curl a tarball
of DBD::Pg from a HTTPs URL specified in Makefile variables (with =? so it
can be overridden to point to an alt location, different version, local
path, etc) and unpack it, then build it. The makefile can also store a
signing key fingerprint so we can download the sig file and check the sig
is by the expected signer if the user has imported the signing key for
DBD::Pg.

We could test if the target directory already exists and is populated and
re-use it, so people can git-clone DBD::Pg if they prefer.

And we'd allow users to specify --with-system-dbd-pg at configure time, or
--without-dbd-pg .

The Pg perl libraries for our TAP test harness/wrappers would offer a
simple function to skip a test if DBD::Pg is missing, a simple function to
skip a test if the loaded DBD::Pg lacks $some_feature_or_attribute, etc.






>
> Greetings,
>
> Andres Freund
>


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: tap tests driving the database via psql

2019-07-30 Thread Craig Ringer
On Tue, 30 Jul 2019 at 21:40, Tom Lane  wrote:

> Craig Ringer  writes:
> > On Sun, 28 Jul 2019 at 03:15, Andres Freund  wrote:
> >> 1) Just depend on DBD::Pg being installed. It's fairly common, after
> >> all. It'd be somewhat annoying that we'd often end up using a
> >> different version of libpq than what we're testing against. But in
> >> most cases that'd not be particularly problematic.
>
> > I advocated for this in the past, and still think it's the best option.
>
> I think the not-same-libpq issue would be a much bigger problem than either
> of you are accounting for.


OK. So rather than building our own $everything from scratch, lets look at
solving that. I'd argue that the PostgreSQL community has enough work to do
maintaining the drivers that are fairly directly community-owned like
PgJDBC and psqlODBC, let alone writing new ones for less-in-demand
languages just for test use.

Perl modules support local installations. Would it be reasonable to require
DBI to be present, then rebuild DBD::Pg against our libpq during our own
test infra compilation, and run it with a suitable LD_LIBRARY_PATH or using
rpath? That'd actually get us a side-benefit of some test coverage of
DBD::Pg and libpq.

Note that I'm not saying this is the only or right way. Your concerns about
using a system libpq are entirely valid, as are your concerns about trying
to link to our libpq using a DBD::Pg originally built against the system
libpq. I've just been dealing with issues similar to those today and I know
how much of a hassle it can be. However, I don't think "it's hard" is a
good reason to write a whole new driver and potentially do a
Netscape/Mozilla.


> * Since we'd have to presume a possibly-very-back-rev libpq, we could
> never add tests related to any recent libpq bug fixes or new features.
>

We can feature-test. We're not dealing with pg_regress where it's
essentially impossible to do anything conditionally without blocks of
plpgsql. We're dealing with Perl and Test::More where we can make tests
conditional, we can probe the runtime environment to decide whether we can
run a given test, etc.

Also, lets compare to the status quo. Will it be worse than exec()ing psql
with a string argument and trying to do sane database access via stdio?
Definitely. Not.


> * The system libpq might have a different idea of default port
> and/or socket directory than the test build.  Yeah, we could work
> around that by forcing the choice all the time, but it would be a
> constant hazard.
>

I'd call that a minor irritation personally, as all we have to do is set up
the environment and we're done.


> * We don't have control over what else gets brought in beside libpq.
>

That's a very significant point that we must pay attention to.

Anyone who's worked with nss will probably know the pain of surprise
library conflicts arising from nss plugins being loaded unexpectedly into
the program. I still twitch thinking about libnss-ldap.

It'd make a lot of sense to capture "ldd" output and/or do a minimal run
with LD_DEBUG set during buildfarm test runs to help identify any
interesting surprises.


> Depending on the whims of the platform packagers, there might need
> to be other parts of the platform's default postgres installation,
> eg psql, sitting in one's PATH.  Again, this wouldn't be too much
> of a hazard for pre-debugged test scripts --- but it would be a huge
> hazard for developers, who do lots of things manually and would always be
> at risk of invoking the wrong psql. I learned long ago not to have any
> part of a platform's postgres packages in place on my development systems,
> and I will fight hard against any test procedure change that requires me
> to do differently.
>

TBH I think that's a habit/procedure issue. That doesn't make it invalid,
but it might not be as big a concern for everyone as for you. I have a
shell alias that sets up my environment and another that prints the current
setup. I never run into issues like this, despite often multi-tasking while
tired. Not only that, I find platform postgres extremely useful to have on
my systems to use for simple cross-version tests.

If we adopt something like I suggested above where we (re)build DBD::Pg
against our installed pg and libpq, that wouldn't be much of an issue
anyway. We'd want a nice way to set up the runtime environment in a shell
for manual testing of course, like a generated .sh we can source. But
frankly I'd find that a useful thing to have in postgres anyway. I can't
count the number of times I've wished there was an easy way to pause
pg_regress and launch a psql session against its temp instance, or do the
same with a TAP test.

Now, none of these things are really a problem with DBD/DBI as such
> --- rather, they are reasons not to depend on a pre-packaged build
> of DBD::Pg that depends on a pre

Re: tap tests driving the database via psql

2019-07-30 Thread Craig Ringer
On Sun, 28 Jul 2019 at 03:15, Andres Freund  wrote:

> 1) Just depend on DBD::Pg being installed. It's fairly common, after
>all. It'd be somewhat annoying that we'd often end up using a
>different version of libpq than what we're testing against. But in
>most cases that'd not be particularly problematic.
>

I advocated for this in the past, and still think it's the best option.

>
> 4) We develop a fairly minimal pure perl database driver, that doesn't
>depend on DBI. Include it somewhere as part of the test code, instead
>of src/interfaces, so it's clearer that it's not ment as an actual
>official driver.
>

Why not write a new language interpreter while we're at it, and maybe a
compiler and runtime? :p

The community IMO wastes *so* much time on not-invented-here make-work and
on jumping through hoops to avoid depending on anything newer than the late
'90s. I'd rather not go deeper down that path. If someone on SunOS or SCO
OpenServer or whatever doesn't want to install DBD::Pg, have the TAP tests
just skip the tests on that platform and make it the platform owner's
problem.


> Craig, IIRC you'd thought about this before too?
>

Right. But IIRC Tom vetoed it on grounds of not wanting to expect buildfarm
operators to install it, something like that.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


[PATCH] Race condition in logical walsender causes long postgresql shutdown delay

2019-07-24 Thread Craig Ringer
Hi folks

I recently tracked down a race in shutdown of logical walsenders that can
cause PostgreSQL shutdown to hang for wal_sender_timeout/2 before it
continues to a normal shutdown. With a long timeout that can be quite
disruptive.

TL;DR: The logical walsender may be signalled to stop, then read the last
WAL record before the shutdown checkpoint is due to be written and go to
sleep. The checkpointer will wait for it to acknowledge the shutdown and
the walsender will wait for new WAL. The deadlock is eventually broken by
the walsender timeout keepalive timer.

Patch attached.

The issue arises from the difference between logical and physical walsender
shutdown as introduced by commit c6c3334364 "Prevent possibility of panics
during shutdown checkpoint". It's probably fairly hard to trigger. I ran
into a case where it happened regularly only because of an unrelated patch
that caused some WAL to be written just before the checkpointer issued
walsender shutdown requests. But it's still a legit bug.

If you hit the issue you'll see that walsender(s) can be seen to be
sleeping in WaitLatchOrSocket in WalSndLoop. They'll keep sleeping until
woken by the keepalive timeout. The checkpointer will be waiting in
WalSndWaitStopping() for the walsenders to enter WALSNDSTATE_STOPPING or
exit, whichever happens first. The postmaster will be waiting in ServerLoop
for the checkpointer to finish the shutdown checkpoint.

The checkpointer waits in WalSndWaitStopping() for all walsenders to either
exit or enter WALSNDSTATE_STOPPING state. Logical walsenders never enter
WALSNDSTATE_STOPPING, they go straight to exiting, so the checkpointer
can't finish WalSndWaitStopping() and write the shutdown checkpoint. A
logical walsender usually notices the shutdown request and exits as soon as
it has flushed all WAL up to the server's flushpoint, while physical
walsenders enter WALSNDSTATE_STOPPING.

But there's a race where a logical walsender may read the final available
record and notice it has caught up - but not notice that it has reached
end-of-WAL and check whether it should exit. This happens on the following
(simplified) code path in XLogSendLogical:

if (record != NULL)
{
XLogRecPtr  flushPtr = GetFlushRecPtr();
LogicalDecodingProcessRecord(...);
sentPtr = ...;
if (sentPtr >= flushPtr)
WalSndCaughtUp = true;// <-- HERE
}

because the test for got_STOPPING that sets got_SIGUSR2 is only on the
other branch where getting a record returns `NULL`; this branch can sleep
before checking if shutdown was requested.

So if the walsender read the last WAL record available, when it's >= the
flush pointer and it already handled the SIGUSR1 latch wakeup for the WAL
write, it might go back to sleep and not wake up until the timeout.

The checkpointer already sent PROCSIG_WALSND_INIT_STOPPING to the
walsenders in the prior WalSndInitStopping() call so the walsender won't be
woken by a signal from the checkpointer. No new WAL will be written because
the walsender just consumed the final record written before the
checkpointer went to sleep, and the checkpointer won't write anything more
until the walsender exits. The client might not be due a keepalive for some
time.The only reason this doesn't turn into a total deadlock is that
keepalive wakeup.

An alternative fix would be to have the logical walsender set
WALSNDSTATE_STOPPING instead of faking got_SIGUSR2, then go to sleep
waiting for more WAL. Logical decoding would need to check if it was
running during shutdown and Assert(...) then ERROR if it saw any WAL
records that result in output plugin calls or snapshot management calls. I
avoided this approach as it's more intrusive and I'm not confident I can
concoct a reliable test to trigger it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise
From 559dda09b35870d3630a65cbca682e50343c6f0f Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 25 Jul 2019 09:14:58 +0800
Subject: [PATCH] Fix a delay in PostgreSQL shutdown caused by logical
 replication

Due to a race with WAL writing during shutdown, if logical walsenders were
running then PostgreSQL's shutdown could be delayed by up to
wal_sender_timeout/2 while it waits for the walsenders to shut down. The
walsenders wait for new WAL or end-of-wal which won't come until shutdown so
there's a deadlock. The walsender timeout eventually breaks the deadlock.

The issue was introduced by PostgreSQL 10 commit c6c3334364
"Prevent possibility of panics during shutdown checkpoint".

A logical walsender never enters WALSNDSTATE_STOPPING and allows the
checkpointer to continue shutdown. Instead it exits when it reads end-of-WAL.
But if it reads the last WAL record written before shutdown and that record
doesn't generate a client network write, it can mark itself caught up and go to
sleep without checking to s

Re: [BDR] Question on compatibility of Postgres with Open JDK

2019-01-30 Thread Craig Ringer
On Wed, 30 Jan 2019 at 15:28, Pratheej Chandrika <
pratheej.chandr...@motorolasolutions.com> wrote:

> Criag,
> Thanks a lot for the prompt reply.
>
> From what you mentioned, I assume that other the client side jdbc  there
> is no Java dependency from Server side. What I mean is I assume there is no
> jdk/java dependency for Postgres server to work. Request your input.
>

Correct.

Please see the PostgreSQL and PgJDBC manuals for more details.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: [BDR] Question on compatibility of Postgres with Open JDK

2019-01-29 Thread Craig Ringer
I don't understand the question.

If you want to use PgJDBC, then yes, PgJDBC works on OpenJDK. You do not
need to use pgjdbc 9.4 with PostgreSQL 9.4, you may use the latest version.

If you want to use pl/java, then I don't know if it's been tested with
postgres-bdr 9.4. But it should work fine if it works with normal community
postgres 9.4.


On Wed, 30 Jan 2019 at 14:28, Pratheej Chandrika <
pratheej.chandr...@motorolasolutions.com> wrote:

> Hello,
> We are using Postgresql (postgresql-bdr94).
>
> Please let us know whether Postgresql supports Open JDK?
>
> Thanks
> Pratheej
>
> --
> You received this message because you are subscribed to the Google Groups
> "Postgres-BDR and pglogical Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to bdr-list+unsubscr...@2ndquadrant.com.
> Visit this group at
> https://groups.google.com/a/2ndquadrant.com/group/bdr-list/.
> To view this discussion on the web visit
> https://groups.google.com/a/2ndquadrant.com/d/msgid/bdr-list/CA%2BOir%3DMAc9OU4iNm%3DRBr%2BF2fLOSjJTACT9X5brChBGVCAJdPzA%40mail.gmail.com
> <https://groups.google.com/a/2ndquadrant.com/d/msgid/bdr-list/CA%2BOir%3DMAc9OU4iNm%3DRBr%2BF2fLOSjJTACT9X5brChBGVCAJdPzA%40mail.gmail.com?utm_medium=email_source=footer>
> .
>


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Reviving the "Stopping logical replication protocol" patch from Vladimir Gordichuk

2019-01-13 Thread Craig Ringer
On Mon, 3 Dec 2018 at 19:38, Dave Cramer  wrote:

> Dmitry,
>
> Please see attached rebased patches
>

I'm fine with patch 0001, though I find this comment a bit hard to follow:

+ * The send_data callback must enqueue complete CopyData messages to libpq
+ * using pq_putmessage_noblock or similar, since the walsender loop may
send
+ * CopyDone then exit and return to command mode in response to a client
+ * CopyDone between calls to send_data.

I think it needs splitting up into a couple of sentences.


In patch 0002, stopping during a txn. It's important that once we skip any
action, we continue skipping. In patch 0002 I'd like it to be clearer that
we will *always* skip the rb->commit callback if rb->continue_decoding_cb()
returned false and aborted the while loop. I suggest storing the result of
(rb->continue_decoding_cb == NULL || rb->continue_decoding_cb())  in an
assignment within the while loop, and testing that result later.

e.g.

(continue_decoding = (rb->continue_decoding_cb == NULL ||
rb->continue_decoding_cb()))

and later

if (continue_decoding) {
rb->commit(rb, txn, commit_lsn);
}

I don't actually think it's necessary to re-test the continue_decoding
callback and skip commit here. If we've sent all the of the txn
except the commit, we might as well send the commit, it's tiny and might
save some hassle later.


I think a comment on the skipped commit would be good too:

/*
 * If we skipped any changes due to a client CopyDone we must not send a
commit
 * otherwise the client would incorrectly think it received a complete
transaction.
 */

I notice that the fast-path logic in WalSndWriteData is removed by this
patch. It isn't moved; there's no comparison of last_reply_timestamp
and wal_sender_timeout now. What's the rationale there? You want to ensure
that we reach ProcessRepliesIfAny() ? Can we cheaply test for a readable
client socket then still take the fast-path if it's not readable?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise


Re: Early WIP/PoC for inlining CTEs

2018-11-18 Thread Craig Ringer
On Sat, 17 Nov 2018 at 10:12, Stephen Frost  wrote:

> Greetings,
>
> * Andrew Gierth (and...@tao11.riddles.org.uk) wrote:
> > >>>>> "Tom" == Tom Lane  writes:
> >
> >  >> [ inlining-ctes-v5.patch ]
> >
> >  Tom> I took a little bit of a look through this.  Some thoughts:
> >
> >  Tom> * I think it'd be a good idea if we made OFFSET/LIMIT in a CTE be
> >  Tom> an alternate way of keeping it from being inlined. As the patch
> >  Tom> stands, if that's the behavior you want, you have no way to
> >  Tom> express it in a query that will also work in older servers. (I
> >  Tom> will manfully resist suggesting that then we don't need the
> >  Tom> nonstandard syntax at all ... oops, too late.)
> >
> > I think this is the wrong approach, because you may want the
> > optimization-barrier effects of OFFSET/LIMIT _without_ the actual
> > materialization - there is no need to force a query like
> >
> > with d as (select stuff from bigtable offset 1) select * from d;
> >
> > to push all the data through an (on-disk) tuplestore.
>
> Agreed, there's going to be cases where you want the CTE to be inlined
> even with OFFSET/LIMIT.  Let's please not cater to the crowd who
> happened to know that they could hack around with OFFSET/LIMIT to make
> something not be inlined when it comes to the question of if the CTE
> should be inlined or not.  That's the same issue we were argueing around
> when discussing if we should allow parallel array_agg, imv.
>
> Particularly since, with CTEs anyway, we never inlined them, so the
> whole OFFSET/LIMIT thing doesn't really make any sense- today, if you
> wrote a CTE, you wouldn't bother with OFFSET/LIMIT because you knew it
> wasn't going to be inlined, that entire line of thinking is for
> subqueries, not CTEs.  If you're going to force people to change their
> CTEs to require that they not be inlined, let's not pick a method which
> makes it ambiguous and makes us have to ask "do they really want this
> limit/offset, or did they just want to make the CTE not be inlined...?"
>
>
To satisfy Tom's understandable desire to let people write queries that
behave the same on old and new versions, can we get away with back-patching
the MATERIALIZED parser enhancement as a no-op in point releases?


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Postgres, fsync, and OSs (specifically linux)

2018-10-18 Thread Craig Ringer
On Fri, 19 Oct 2018 at 07:27, Thomas Munro 
wrote:

>
> 2.  I am +1 on back-patching Craig's PANIC-on-failure logic.  Doing
> nothing is not an option I like.  I have some feedback and changes to
> propose though; see attached.
>

Thanks very much for the work on reviewing and revising this.


> I don't see why sync_file_range(SYNC_FILE_RANGE_WRITE) should get a
> pass here.  Inspection of some version of the kernel might tell us it
> can't advance the error counter and report failure, but what do we
> gain by relying on that?  Changed.
>

I was sure it made sense at the time, but I can't explain that decision
now, and it looks like we should treat it as a failure.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Code of Conduct plan

2018-09-19 Thread Craig Ringer
ame it wan't handled by a complaint to a conference CoC group
instead.

I'd like the CoC to emphasise that while we don't want to restrain people
from "calling out" egregious behaviour, going via the CoC team is often
more likely to lead to constructive communication and positive change.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Postgres, fsync, and OSs (specifically linux)

2018-08-29 Thread Craig Ringer
On 15 August 2018 at 07:32, Thomas Munro 
wrote:

> On Wed, Aug 15, 2018 at 11:08 AM, Asim R P  wrote:
> > I was looking at the commitfest entry for feature
> > (https://commitfest.postgresql.org/19/1639/) for the most recent list
> > of patches to try out.  The list doesn't look correct/complete.  Can
> > someone please check?
>
> Hi Asim,
>
> This thread is a bit tangled up.  There are two related patchsets in it:
>
> 1.  Craig Ringer's PANIC-on-EIO patch set, to cope with the fact that
> Linux throws away buffers and errors after reporting an error, so the
> checkpointer shouldn't retry as it does today.  The latest is here:
>
> https://www.postgresql.org/message-id/CAMsr%2BYFPeKVaQ57PwHqmRNjPCPABsdbV%
> 3DL85he2dVBcr6yS1mA%40mail.gmail.com
>
> 2.  Andres Freund's fd-sending fsync queue, to cope with the fact that
> some versions of Linux only report writeback errors that occurred
> after you opened the file, and all versions of Linux and some other
> operating systems might forget about writeback errors while no one has
> it open.
>
> Here is the original patchset:
>
> https://www.postgresql.org/message-id/20180522010823.
> z5bdq7wnlsna5qoo%40alap3.anarazel.de
>
> Here is a fix-up you need:
>
> https://www.postgresql.org/message-id/20180522185951.
> 5sdudzl46spktyyz%40alap3.anarazel.de
>
> Here are some more fix-up patches that I propose:
>
> https://www.postgresql.org/message-id/CAEepm%3D2WSPP03-20XHpxohSd2UyG_
> dvw5zWS1v7Eas8Rd%3D5e4A%40mail.gmail.com
>
> I will soon post some more fix-up patches that add EXEC_BACKEND
> support, Windows support, and a counting scheme to fix the timing
> issue that I mentioned in my first review.  I will probably squash it
> all down to a tidy patch-set after that.
>


Thanks very much Tomas.

I've had to back off from this a bit after posting my initial
panic-for-safety patch, as the changes Andres proposed are a bit out of my
current depth and time capacity.

I still think the panic patch is needed and appropriate, but agree it's not
*sufficient*.


Allow postgres_fdw passwordless non-superuser conns with prior superuser permission

2018-08-05 Thread Craig Ringer
Hi all

Currently postgres_fdw cannot be used with 'cert' authentication, i.e.
client-certificate validation and cert cn => postgres username mapping. You
also can't use things like Kerberos, SSPI, etc with a superuser-created FDW
and username map.

To permit this, I'd like to allow postgres_fdw user mappings to be created
with a new 'permit_passwordless' option. Only the superuser is allowed to
create such a mapping. If it's set to true, we bypass the
check_conn_params(...)
connection-string password check and the connect_pg_server(...) check for
the conn using a password when a non-superuser establishes a connection.

This doesn't re-open CVE-2007-6601 because the superuser has to explicitly
grant the access.

To make SSL client certs work properly with FDWs, I'd also like to add a
libpq parameter 'sslpassword' parameter, which corresponds to the PgJDBC
parameter of the same name. This lets the superuser create user mappings
for ssl client cert auth that use a user-specific 'sslcert', 'sslkey', and
'sslpassword'. Users can't use each others' keys because they don't know
the key password, and they can't create passwordless user mappings anyway.
Without 'sslpassword' a non-superuser user could make a connection to an
'md5 clientcert=1' server using another users' client cert. It's a trivial
change.

Patches to follow shortly.



Detailed explanation.

postgres_fdw assumes that connections that do not specify a password in the
connstr and connections that lack a password on the user mapping are
insecure and rejects them for non-superusers with:

ERROR: password is required
DETAIL: Non-superuser cannot connect if the server does not request a
password.
HINT: Target server's authentication method must be changed.

or

ERROR: password is required
DETAIL: Non-superusers must provide a password in the user mapping.

See check_conn_params and connect_pg_server
in contrib/postgres_fdw/connection.c .

This is because of CVE-2007-6601 for dblink.

It's assuming that connections without passwords must therefore not have
anything to make sure the local postgres user is really the user authorized
to access the remote end as that remote postgres user. It's trying to stop
use of 'peer' or 'ident', which we shouldn't permit because we'd be
allowing the non-superuser to potentially authenticate as the (usually)
'postgres' system-user and gain access to a 'postgres' superuser db
account. Or the connection might be using a service file or .pgpass in the
'postgres' user's home directory, in which case again we don't want a
non-superuser able to use it.

For 'cert' authentication you don't want to assume that the non-superuser
should be allowed to use the client certificate's private key file from the
file system. We don't provide a way to provide the ssl key as a literal in
the postgres_fdw user mapping. Nor is there a way to store the ssl key
encrypted, and put the ssl key passphrase in the user mapping, making sure
that just the authorized user can access it.

The problem is that you can't then use 'cert' auth for postgres_fdw at all.

Nor can you authorize a non-superuser to use passwordless authentication if
you know your local server is configured securely (no 'trust', 'peer' or
'ident') and you *want* to authorize the user to use a service file,
pgpass, etc.

We should allow this if the user mapping creator is the superuser and they
explicitly mark the mapping as permit_passwordless. That won't let normal
users escalate.

If the superuser is the one creating the user mapping between local and
remote user IDs, then they're the ones delegating the access. They can
already GRANT mysuperuser TO public if they're feeling stupid; similarly,
if they CREATE USER MAPPING FOR public SERVER localhost OPTIONS
('permit_passwordless',
'true', ...) ... then it's on them if they have 'peer' or 'ident' enabled
on the server.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Have an encrypted pgpass file

2018-07-23 Thread Craig Ringer
On 24 July 2018 at 05:53, Jeff Janes  wrote:

> On Wed, Jul 18, 2018 at 5:52 PM, Tom Lane  wrote:
>
>> Thomas Munro  writes:
>> > On Thu, Jul 19, 2018 at 5:46 AM, Marco van Eck 
>> wrote:
>> >> Since .pgpass files contain plain-text passwords, I searched for an
>> >> alternative.
>> >> In the attached patch I've added the possibility to run a command to
>> produce
>> >> the content of the pgpass file, in exactly the same format.
>>
>> > ... Here you side step those questions completely and make that the end
>> > user's problem.   I like it.
>>
>> ... but doesn't this just encourage people to build hacks that aren't
>> really any more secure than the unreadable-file approach?  In fact,
>> I'm afraid this would be an attractive nuisance, in that people would
>> build one-off hacks that get no security vetting and don't really work.
>>
>> I'd like to see a concrete example of a use-case that really does add
>> security; preferably one short and useful enough to put into the docs
>> so that people might copy-and-paste it rather than rolling their own.
>> It seems possible that something of the sort could be built atop
>> ssh-agent or gpg-agent, for instance.
>>
>
> If the goal is not unattended operation but just unannoying operation, I
> think the first example he provided is already that use-case.  If you
> already have gpg configured to use gpg-agent, then it just works.  You get
> encryption-at-rest, and you don't have to type in your password repeatedly
> in the same continuous shell session.
>

... and the attacker steals the key from gpg-agent.

Grabbing it from a process's memory is a bit harder than grabbing contents
of a file, but not much harder. If the agent is remote then that's harder,
but you can just ask the script to decrypt the pgpass for you, so again,
not much of a win.

Even with a hardware crypto offload device the advantage here seems to be
mainly limited to making it harder to capture data from backups or
file-lifting attacks. Anything that can execute code or commands on the
host can still get the credentials.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Missing pg_control crashes postmaster

2018-07-23 Thread Craig Ringer
On 24 July 2018 at 03:31, Brian Faherty 
wrote:

> Hey Hackers,
>
> If a postmaster is running and the pg_control file is removed postgres
> will PANIC.


How does that happen?

"Don't go deleting stuff in pgdata" is pretty fundamental.


Re: Background worker/idle sessions and caching

2018-07-18 Thread Craig Ringer
On 19 July 2018 at 04:30, Jeremy Finzel  wrote:

My use case is similar to the example of worker_spi.  A plpgsql function
> runs every 1 minute and processes records in audit tables in order to
> update fact tables with records that have changed.  I noticed for example
> renaming a column in the function was not picked up, and I had to restart
> the workers to reset the cache.
>

relation_openrv and relation_openrv_extended
call AcceptInvalidationMessages when the lockmode is not NoLock.

Don't we use those when doing SPI queries? No time to check right now, try
setting some breakpoints or tracing through.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: untrusted PLs should be GRANTable

2018-07-18 Thread Craig Ringer
On 19 July 2018 at 08:23, Stephen Frost  wrote:

> Greetings,
>
> * Craig Ringer (cr...@2ndquadrant.com) wrote:
> > Untrusted PLs should be GRANTable with a NOTICE or WARNING telling the
> > admin that GRANTing an untrusted PL effectively gives the user the
> ability
> > to escape to superuser.
>
> I don't know that we really want to get into the business of issuing a
> NOTICE or WARNING in such cases.  We don't do that in a lot of other
> cases where non-superusers can be GRANT'd access which would allow them
> to become a superuser and if we start doing it now then we're going to
> need to go back and change the existing places to have such NOTICE or
> WARNING, or we'll be inconsistent about it, which would be worse.  I
> also worry that we'd start wanting to have NOTICEs for when we are
> allowing users to GRANT roles (like pg_monitor) that might get access to
> data that isn't obvious, even if they aren't able to become a superuser
> and it just gets ugly.
>
>
Good point.

I was mostly trying to anticipate concerns about people unwittingly
granting access to untrusted languages.

But hey, if you're using GRANT you should know what it means.

Alternately,

GRANT USAGE ON [UNTRUSTED] LANGUAGE plpythonu;

and if you don't write UNTRUSTED we emit the existing error?

It at least means people have to think about it and recognise the
difference.

Not really convinced it's worth the hassle, but the "u" suffix isn't what
you'd call clearly a self-documenting warning of superuser-equivalent
rights either.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [bug fix] Produce a crash dump before main() on Windows

2018-07-18 Thread Craig Ringer
On 18 July 2018 at 14:34, Michael Paquier  wrote:

> On Wed, Jul 18, 2018 at 06:09:57AM +, Tsunakawa, Takayuki wrote:
> > From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
> >> I reviewed patch and it works as per the subject, but I am not able to
> verify
> >> the actual
> >> bug that is reported in the upthread. The moving of setErrorMode() call
> >> to the start
> >> of the main function can handle all the cases that can lead to a crash
> with
> >> no popup.
> >
> > Thank you for reviewing the patch.  Yeah, I don't know reproduce the
> > problem too, because some bug of Symantec AntiVirus randomly caused
> > the crash before main()...
>
> A suggestion that I have here would be to patch manually the postmaster
> with any kind of code which would make it to crash before main() is
> called.  Anything is fine as long as a core dump is produced, right?
>
>
When I was  doing the original crashdump support I wrote a crashme
extension that deliberately dereferenced a null pointer.

You'd do something similar in the postmaster for the testing.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [bug fix] Produce a crash dump before main() on Windows

2018-07-17 Thread Craig Ringer
On 26 February 2018 at 12:06, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> > The patch proposed here means that early crashes will invoke WER. If
> we're
> > going to allow WER we should probably just do so unconditionally.
> >
> > I'd be in favour of leaving WER on when we find out we're in a
> noninteractive
> > service too, but that'd be a separate patch for pg11+ only.
>
> As for PG11+, I agree that we want to always leave WER on.  That is, call
> SetErrorMode(SEM_FAILCRITICALERRORS) but not specify
> SEM_NOGPFAULTERRORBOX.  The problem with the current specification of
> PostgreSQL is that the user can only get crash dumps in a fixed folder
> $PGDATA\crashdumps.  That location is bad because the crash dumps will be
> backed up together with the database cluster without the user noticing it.
> What's worse, the crash dumps are large.  With WER, the user can control
> the location and size of crash dumps.
>

Yeah, that's quite old and dates back to when Windows didn't offer much if
any control over WER in services.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


untrusted PLs should be GRANTable

2018-07-16 Thread Craig Ringer
Hi all

A user has raised the point that our refusal to GRANT rights to untrusted
PLs is counterproductive and inconsistent with how we behave elsewhere.

Yes, untrusted PLs can be escaped to gain superuser rights, often trivially.

But we allow this:

CREATE ROLE superme SUPERUSER NOINHERIT;
GRANT superme TO me;

 and really, GRANTing an untrusted PL is similar.

Forcing users to create their PLs as a superuser increases the routine use
of superuser accounts. Most users' DDL deploy scripts will get be run as a
superuser if they have to use a superuser for PL changes; they're not going
to SET ROLE and RESET ROLE around the function changes.

It also encourages users to make their untrusted functions SECURITY DEFINER
when still owned by a superuser, which we really don't want them doing
unnecessarily.

In the name of making things more secure, we've made them less secure.

Untrusted PLs should be GRANTable with a NOTICE or WARNING telling the
admin that GRANTing an untrusted PL effectively gives the user the ability
to escape to superuser.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


bgworkers should share more of BackendRun / PostgresMain

2018-07-11 Thread Craig Ringer
Hi folks

As I work on background workers almost exclusively at the moment, I keep on
running into things that are just not initialized, or not supported, in
bgworkers. Many are quite surprising.

I think bgworkers should just share more of the main Pg startup path.

My latest surprise is that we don't re-initialize the RNG for bgworkers,
because it's done in BackendRun before calling PostgresMain, and bgworker
code path diverges before we call BackendRun.

bgworkers also have to do all their own error handling, duplicating lots of
logic in PostgresMain, if they want to have any error recovery except "die
and restart".

bgworkers don't participate in the ProcSignal mechanism, get query cancels,
etc, unless they do extra setup.

There are all sorts of other small surprises, like the way we don't set up
the dbname in shmem, so logs for bgworkers show it as empty.

There's no patch here because I'm not yet sure what the best approach is.
Some things seem clear, like moving the RNG init work from BackendRun a bit
earlier so all backends hit it. Other parts much less so.

Look at the number of calls to InitProcess() from our profusion of custom
worker types, user backends, the bgworker infrastructure, etc.

I could just copy stuff from BackendRun / PostgresMain to
StartBackgroundWorker() but I'm sure that's not the right approach. I think
we need to decide what should be common and factor it out a bit.

So discussion/ideas appreciated.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: How can we submit code patches that implement our (pending) patents?

2018-07-11 Thread Craig Ringer
On 12 July 2018 at 09:10, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Nico Williams [mailto:n...@cryptonector.com]
> > On Thu, Jul 12, 2018 at 12:29:12AM +, Tsunakawa, Takayuki wrote:
> > > How can one make defensive use of his patent if he allows everyone to
> > > use it royalty-free?  Can he use his patent for cross-licensing
> > > negotiation if some commercial database vendor accuses his company
> > > that PostgreSQL unintentionally infringes that vendor's patent?
> >
> > https://en.wikipedia.org/wiki/Defensive_termination
>
> Thank you, his looks reasonable to give everyone the grant.  Then, I
> wonder if the community can accept patented code if the patent holder
> grants this kind of license.


Personally, I'd be in favour, but the core team has spoken here. I'm not
privy to the discussion but the outcome seems firm.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Usage of epoch in txid_current

2018-07-09 Thread Craig Ringer
On 10 July 2018 at 10:40, Andres Freund  wrote:

> On 2018-07-10 10:32:44 +0800, Craig Ringer wrote:
> > On 10 July 2018 at 07:35, Thomas Munro 
> > wrote:
> >
> > >
> > > I played around with this idea yesterday.  Experiment-grade patch
> > > attached.  Approach:
> > >
> > > 1.  Introduce a new type BigTransactionId (better names welcome).
> > >
> >
> > txid_current() should be changed to BigTransactionId too. It's currently
> > bigint.
>
> That's quite a bit more work though, no? We'd have to introduce a new
> SQL level type (including operators).  As I said nearby, I think that's
> a good idea, but I don't see any sort of benefit of forcing those
> patches to be done at once.
>

Yeah. You're right. One step at a time.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Usage of epoch in txid_current

2018-07-09 Thread Craig Ringer
On 10 July 2018 at 10:32, Craig Ringer  wrote:


>
>
>> I think it's probably a good idea to make it very explicit when moving
>> between big and small transaction IDs, hence the including of the word
>> 'big' in variable and function names and the use of a function-like
>> macro (rather than implicit conversion, which C doesn't give me a good
>> way to prevent).
>
>
> The only way I know of to prevent it is to use a wrapper by-value struct.
>
>
... and that's what I get for not finishing the thread before replying.

Anyway, please consider the other point re txid_current() and the user
facing xid type.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Usage of epoch in txid_current

2018-07-09 Thread Craig Ringer
On 10 July 2018 at 07:35, Thomas Munro 
wrote:

>
> I played around with this idea yesterday.  Experiment-grade patch
> attached.  Approach:
>
> 1.  Introduce a new type BigTransactionId (better names welcome).
>

txid_current() should be changed to BigTransactionId too. It's currently
bigint.

Users are currently left in a real mess when it comes to pg_stat_activity
xids, etc, which are epoch-unqualified. txid_current() reports an
epoch-qualified xid, but there's no sensible and safe conversion from
txid_current()'s bigint to an epoch-qualified ID. age() takes 'xid',
everything takes 'xid', but is completely oblivious to epoch.

IMO all user-facing xids should be moved to BigTransactionId, with the
'xid' type ceasing to appear in any user facing role.




> I think it's probably a good idea to make it very explicit when moving
> between big and small transaction IDs, hence the including of the word
> 'big' in variable and function names and the use of a function-like
> macro (rather than implicit conversion, which C doesn't give me a good
> way to prevent).


The only way I know of to prevent it is to use a wrapper by-value struct.
I've used this technique in the past where it's critical that implicit
conversions don't occurr, but it sure isn't pretty.

typedef struct BigTransactionId
{
   uint64 value;
} BigTransactionId;

instead of

typedef uint64 BigTransactionId;

It's annoying having to get the value member, but it's definitely highly
protective against errors. Pass-by-value isn't a problem, neither is
return-by-value.

BigTransactionId
add_one(BigTransactionId xid)
{
BigTransactionId ret;
ret.value = xid.value + 1;
return ret;
}

Personally I think it's potentially worth the required gymnastics if older
compilers do a sane job of code generation with this.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: How can we submit code patches that implement our (pending) patents?

2018-07-09 Thread Craig Ringer
On 9 July 2018 at 15:51, Tsunakawa, Takayuki  wrote:

> From: davecra...@gmail.com [mailto:davecra...@gmail.com] On Behalf Of
> > Dave Cramer
> > Certainly there is history of people using PG code for non-PostgreSQL or
> > at least commercial derivative work. Greenplum for example.
>
> Yes, we would be glad if companies in the PostgreSQL community, including
> Greenplum, could utilize our patents and prosper with no charge.  We just
> want to use our (future) patents to make PostgreSQL a more wonderful DBMS,
> and to protect PostgreSQL against patent lawsuits by companies outside
> PostgreSQL community.


I can't speak for my employer, but to me it seems like a broad patent grant
that follows the code and derivatives would be reasonable.

A grant just for "PostgreSQL", not so much.

I'm not sure how much practical protection/benefit that'd offer unless it
came with a retaliation clause, though, and how it'd differ from an
unrestricted grant of rights for everyone. I mentioned that earlier. If you
can take a few snippets of PostgreSQL code and add them to your product to
inherit the patent grant, then in practice the patent grant is universal,
not specific to PostgreSQL and derivatives. But if you start requiring
"substantial" derivatives, etc, it gets messy and complicated.

I'm not sure how grants with retaliation clauses would be received by
others here. I lack the expertise to have much of an opinion myself. Is
that what you propose?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


<    1   2   3   4   5   6   >