from:"Simon Riggs"

Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-11 Thread Simon Riggs

On 9 June 2018 at 15:41, Amit Kapila  wrote:
> On Fri, Jun 8, 2018 at 5:27 PM, Simon Riggs  wrote:
>> On 8 June 2018 at 11:33, Amit Kapila  wrote:
>>> On Fri, Jun 8, 2018 at 1:53 PM, Simon Riggs  wrote:
>>>>
>>>> So the attached patch fixes both the bug in the recent commit and the
>>>> one I just found by observation of 49bff5300d527, since they are
>>>> related.
>>>>
>>>> StandbyReleaseOldLocks() can sweep in the same way as
>>>> ExpireOldKnownAssignedTransactionIds().
>>>>
>>>
>>
>>> How will this avoid the bug introduced by your recent commit
>>> (32ac7a11)?  After that commit, we no longer consider vacuum's xid in
>>> RunningTransactions->oldestRunningXid calculation, so it is possible
>>> that we still remove/release its lock on standby earlier than
>>> required.
>>
>> In the past, the presence of an xid was required to prevent removal by
>> StandbyReleaseOldLocks().
>>
>> The new patch removes that requirement and removes when the xid is
>> older than oldestxid
>>
>
> The case I have in mind is: "Say vacuum got xid 600 while truncating,
> and then some other random transactions 601,602  starts modifying the
> database.  At this point, I think the computed value of
> RunningTransactions->oldestRunningXid will be 601.  Now, on standby
> when StandbyReleaseOldLocks sees the lock from transaction 600, it
> will remove it which doesn't appear correct to me.".

OK, thanks. Patch attached.

>> The normal path of removal is at commit or abort,
>> StandbyReleaseOldLocks is used almost never (as originally intended).
>>
> Okay, but that doesn't prevent us to ensure that whenever used, it
> does the right thing.

What do you mean? Has anyone argued in favour of doing the wrong thing?


I'm playing around with finding a test case to prove this area works,
rather than just manual testing. Suggestions welcome.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


execution_ordering_in_running_xact.v1.patch
Description: Binary data

Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-09 Thread Simon Riggs

On 8 June 2018 at 19:03, Andres Freund  wrote:
> Hi,
>
> On 2018-06-08 09:23:02 +0100, Simon Riggs wrote:
>> I have also found another bug which affects what we do next.
>>
>> For context, AEL locks are normally removed by COMMIT or ABORT.
>> StandbyReleaseOldLocks() is just a sweeper to catch anything that
>> didn't send an abort before it died, so it hardly ever activates. The
>> coding of StandbyReleaseOldLocks() is backwards... if it ain't in the
>> running list, then we remove it.
>>
>> But that wasn't working correctly either, since as of 49bff5300d527 we
>> assigned AELs to subxids. Subxids weren't in the running list and so
>> AELs held by them would have been removed at the wrong time, an extant
>> bug in PG10. It looks to me like they would have been removed either
>> early or late, up to the next runningxact info record. They would be
>> removed, so no leakage, but the late timing wouldn't be noticeable for
>> tests or most usage, since it would look just like lock contention.
>> Early release might give same issue of block access to non-existent
>> block/file.
>
> Yikes, that's kinda bad. It can also just cause plain crashes, no? The
> on-disk / catalog state isn't necessarily consistent during DDL, which
> is why we hold AE locks. At the very least it can cause corruption of
> in-use relcache entries and such.  In practice the fact this probably
> hits only a limited number of people because it requires DDL to happen
> in subtransactions, which probably isn't *that* common.

Yep, kinda bad, hence fix.

>> So the attached patch fixes both the bug in the recent commit and the
>> one I just found by observation of 49bff5300d527, since they are
>> related.
>
> Can we please keep them separate?

The attached patch is separate. It fixes 49bff5300d527 and also the
committed code, but should work fine even if we revert. Will
doublecheck.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-09 Thread Simon Riggs

On 8 June 2018 at 19:03, Andres Freund  wrote:

>> It seems to have affected Greg.
>
> As far as I can tell Greg was just theorizing?

I'll wait for Greg to say whether this was an actual case that needs
to be fixed or not. If not, happy to revert.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-08 Thread Simon Riggs

On 8 June 2018 at 11:33, Amit Kapila  wrote:
> On Fri, Jun 8, 2018 at 1:53 PM, Simon Riggs  wrote:
>>
>> So the attached patch fixes both the bug in the recent commit and the
>> one I just found by observation of 49bff5300d527, since they are
>> related.
>>
>> StandbyReleaseOldLocks() can sweep in the same way as
>> ExpireOldKnownAssignedTransactionIds().
>>
>

> How will this avoid the bug introduced by your recent commit
> (32ac7a11)?  After that commit, we no longer consider vacuum's xid in
> RunningTransactions->oldestRunningXid calculation, so it is possible
> that we still remove/release its lock on standby earlier than
> required.

In the past, the presence of an xid was required to prevent removal by
StandbyReleaseOldLocks().

The new patch removes that requirement and removes when the xid is
older than oldestxid

The normal path of removal is at commit or abort,
StandbyReleaseOldLocks is used almost never (as originally intended).

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-08 Thread Simon Riggs

On 7 June 2018 at 22:25, Andres Freund  wrote:
> On 2018-06-07 14:19:18 -0700, Andres Freund wrote:

> Look at:
>
> void
> ProcArrayApplyRecoveryInfo(RunningTransactions running)
> ...
> /*
>  * Remove stale locks, if any.
>  *
>  * Locks are always assigned to the toplevel xid so we don't need to 
> care
>  * about subxcnt/subxids (and by extension not about ->suboverflowed).
>  */
> StandbyReleaseOldLocks(running->xcnt, running->xids);
>
> by excluding running transactions you have, as far as I can tell,
> effectively removed the vacuum truncation AEL from the standby.

I agree, that is correct, there is a bug in my recent commit that
causes a small race window that could potentially lead to someone
reading the size of a relation just before it is truncated and then
falling off the end of the scan, resulting in a block access ERROR,
potentially much later than the truncate.

I have also found another bug which affects what we do next.

For context, AEL locks are normally removed by COMMIT or ABORT.
StandbyReleaseOldLocks() is just a sweeper to catch anything that
didn't send an abort before it died, so it hardly ever activates. The
coding of StandbyReleaseOldLocks() is backwards... if it ain't in the
running list, then we remove it.

But that wasn't working correctly either, since as of 49bff5300d527 we
assigned AELs to subxids. Subxids weren't in the running list and so
AELs held by them would have been removed at the wrong time, an extant
bug in PG10. It looks to me like they would have been removed either
early or late, up to the next runningxact info record. They would be
removed, so no leakage, but the late timing wouldn't be noticeable for
tests or most usage, since it would look just like lock contention.
Early release might give same issue of block access to non-existent
block/file.

So the attached patch fixes both the bug in the recent commit and the
one I just found by observation of 49bff5300d527, since they are
related.

StandbyReleaseOldLocks() can sweep in the same way as
ExpireOldKnownAssignedTransactionIds().

> I also don't understand why this change would be backpatched in the
> first place. It's a relatively minor efficiency thing, no?

As for everything, that is open to discussion. Yes, it seems minor to
me until it affects you, then its not. It seems to have affected
Greg.

The attached patch, or a later revision, needs to be backpatched to
PG10 independently of the recent committed patch.

I have yet to test this manually, but will do so tomorrow morning.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

remove_standby_subxid_locks.v1.patch
Description: Binary data

Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-06-07 Thread Simon Riggs

On 7 June 2018 at 22:19, Andres Freund  wrote:

> Wonder if the right thing here wouldn't be to instead transiently
> acquire an AEL lock during replay when truncating a relation?

The way AELs are replayed in generic, all AEL requests are handled that way.

So yes, you could invent a special case for this specific situation.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

2018-06-07 Thread Simon Riggs

On 7 June 2018 at 20:27, Tom Lane  wrote:
> Simon Riggs  writes:
>> If we're going to compress the protocol, it seems sensible to remove
>> extraneous information first.
>
> Breaking the wire protocol was nowhere in this thread.

No, it wasn't. But there is another thread on the subject of
compressing libpq, which is what I was referring to.

Andres' patch is clearly very efficient at adding the SELECT tag. I am
questioning if we can remove that need altogether.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

2018-06-07 Thread Simon Riggs

On 7 June 2018 at 19:20, Andres Freund  wrote:
> On 2018-06-07 11:40:48 +0100, Simon Riggs wrote:
>> On 7 June 2018 at 11:29, Pavel Stehule  wrote:
>>
>> >> Do we actually need the completion tag at all? In most cases??
>> >
>> >
>> > affected rows is taken from this value on protocol level
>>
>> I didn't mean we should remove the number of rows. Many things rely on that.
>
> How do you mean it then? We can't really easily change how we return
> that value on the protocol level, and the command tag is basically just
> returned as a string in the protocol.  If we were to design the protocol
> today I'm sure we just would declare the rowcount and affected rowcounts
> separate fields or something, but ...

I meant remove the pointless noise word at the start of the tag that
few clients care about.

I was thinking of just returning "SQL" instead, but that wasn't after
much thought.

But now I think about it more returning any fixed string, "SQL" or
"SELECT", in the protocol seems like a waste of bandwidth and a
potential route in to decrypt the stream.

If we're going to compress the protocol, it seems sensible to remove
extraneous information first.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

2018-06-07 Thread Simon Riggs

On 7 June 2018 at 11:29, Pavel Stehule  wrote:

>> Do we actually need the completion tag at all? In most cases??
>
>
> affected rows is taken from this value on protocol level

I didn't mean we should remove the number of rows. Many things rely on that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: computing completion tag is expensive for pgbench -S -M prepared

2018-06-07 Thread Simon Riggs

On 7 June 2018 at 06:01, David Rowley  wrote:
> On 7 June 2018 at 16:13, Andres Freund  wrote:
>> in PortalRun().  That's actually fairly trivial to optimize - we don't
>> need the full blown snprintf machinery here.  A quick benchmark
>> replacing it with:
>>
>>memcpy(completionTag, "SELECT ", sizeof("SELECT "));
>>pg_lltoa(nprocessed, completionTag + 7);
>
> I'd also noticed something similar with some recent benchmarks I was
> doing for INSERTs into partitioned tables. In my case I saw as high as
> 0.7% of the time spent building the INSERT tag. So I think it's worth
> fixing this.
>
> I think it would be better to invent a function that accepts a
> CmdType, int64 and Oid that copies the tag into the supplied buffer,
> then make a more generic change that also replaces the code in
> ProcessQuery() which builds the tag. I'm sure there must be some way
> to get the CmdType down to the place you've patched so we can get rid
> of the if (strcmp(portal->commandTag, "SELECT") == 0) line too.

Sounds better

Do we actually need the completion tag at all? In most cases??

Perhaps we should add a parameter to make it optional and turn it off
by default, except for psql.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Possible bug in logical replication.

2018-06-07 Thread Simon Riggs

On 6 June 2018 at 17:22, Alvaro Herrera  wrote:
> This thread seems to have died down without any fix being proposed.
> Simon, you own this open item.

Thanks, will look.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: I'd like to discuss scaleout at PGCon

2018-06-06 Thread Simon Riggs

On 5 June 2018 at 17:14, MauMau  wrote:

> Furthermore, an extra hop and double parsing/planning could matter for
> analytic queries, too.  For example, SAP HANA boasts of scanning 1
> billion rows in one second.  In HANA's scaleout architecture, an
> application can connect to any worker node and the node communicates
> with other nodes only when necessary (there's one special node called
> "master", but it manages the catalog and transactions; it's not an
> extra hop like the coordinator in XL).  Vertica is an MPP analytics
> database, but it doesn't have a node like the coordinator, either.  To
> achieve maximum performance for real-time queries, the scaleout
> architecture should avoid an extra hop when possible.

Agreed. When possible.

When is it possible?

Two ways I know of are:

1. have pre-knowledge in the client of where data is located
(difficult to scale)
2. you must put data in all places the client can connect to (i.e. multimaster)

Perhaps there are more?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: plans for PostgreSQL 12

2018-06-03 Thread Simon Riggs

On 4 June 2018 at 06:08, Pavel Stehule  wrote:

> 4. optimization expression without necessity to create snapshots -
> experiments
>
> @4 There are lot of not database expressions in PLpgSQL - like var1 := var1
> + var2 or var1 := var1 + konst. Own calculation needs about 1% of time of
> total expression evaluation time. Almost all time get preparing plan cache,
> preparing snapshot, .. For this case, when no database object is used, we
> don't need use this infrastructure. I would to measure performance impact,
> and testing if these optimizations are interesting or not.

Sounds good. I think this would need to be restricted by operator and
datatype, since in general you won't know if the datatype functions
need a snapshot or not. Immutable functions for the operators ought to
do it, but I think that might not be enough.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: I'd like to discuss scaleout at PGCon

2018-06-02 Thread Simon Riggs

On 2 June 2018 at 22:46, Ashutosh Bapat  wrote:

>>>> And that is why both XL and "FDW approach" rely on a central coordinator.
>>>
>>> I don't think we ever specified that "FDW approach" "relies" on a
>>> central coordinator. One could configure and setup a cluster with
>>> multiple coordinators using FDWs.
>>
>> Yes, of course. You're just misunderstanding me. I'm talking about a
>> query coordinator "role". There can be many coordinator components and
>> they can be spread out in variours ways, but for any one SQL query
>> there needs to be one coordinator node. Not a SPOF.
>
> In your earlier mail, which is included above, you mentioned central
> coordinator for multi-node transaction control and global deadlock
> detection. That doesn't sound like a "query coordinator role". It
> sounds more like GTM, which is an SPOF.

In XL, GTM is a singe component managing transaction ids. That has a
standby, so is not a SPOF.

But that is not what I mean. I don't believe that a GTM-style
component is necessary in a future in-core scalablility solution.

Each incoming query needs to be planned and executed from one
coordinator component, then the work performed across many workers on
different nodes (or just one). We could have coordinator components on
each worker node, or we could have a set of coordinator nodes and a
set of worker nodes.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: I'd like to discuss scaleout at PGCon

2018-06-02 Thread Simon Riggs

On 1 June 2018 at 16:56, Ashutosh Bapat  wrote:
> On Fri, Jun 1, 2018 at 11:10 AM, Simon Riggs  wrote:
>>
>> Using a central coordinator also allows multi-node transaction
>> control, global deadlock detection etc..
>
> But that becomes an SPOF and then we have to configure a standby for
> that. I am not saying that that's a bad design but it's not very good
> for many work-loads. But it would be good if we could avoid any
> "central server" in this configuration.
>
>>
>> And that is why both XL and "FDW approach" rely on a central coordinator.
>
> I don't think we ever specified that "FDW approach" "relies" on a
> central coordinator. One could configure and setup a cluster with
> multiple coordinators using FDWs.

Yes, of course. You're just misunderstanding me. I'm talking about a
query coordinator "role". There can be many coordinator components and
they can be spread out in variours ways, but for any one SQL query
there needs to be one coordinator node. Not a SPOF.

>>
>> FDWs alone are not enough. It is clear that some more tight coupling
>> is required to get things to work well. For example, supporting SQL
>> query plans that allow for redistribution of data for joins.
>
> I think partitioning + FDW provide basic infrastructure for
> distributing data, planning queries working with such data. We need
> more glue to support node management, cluster configuration. So, I
> agree with your statement. But I think it was clear from the beginning
> that we need more than FDW and partitioning.

No, it wasn't clear. But I'm glad to hear it. It might actually work then.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: I'd like to discuss scaleout at PGCon

2018-06-01 Thread Simon Riggs

On 1 June 2018 at 04:00, MauMau  wrote:

> The SQL processor should be one layer, not two layers.

For OLTP, that would be best. But it would be restricted to
single-node requests, leaving you the problem of how you know ahead of
time whether an SQL statement was single node or not.

Using a central coordinator node allows us to hide the decision of
single-node/multi-node from the user which seems essential for general
SQL. If you are able to restrict the types of requests users make then
we can do direct access to partitions - so there is scope for a
single-node API, as Mongo provides.

Using a central coordinator also allows multi-node transaction
control, global deadlock detection etc..

And that is why both XL and "FDW approach" rely on a central coordinator.

FDWs alone are not enough. It is clear that some more tight coupling
is required to get things to work well. For example, supporting SQL
query plans that allow for redistribution of data for joins.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: I'd like to discuss scaleout at PGCon

2018-06-01 Thread Simon Riggs

On 1 June 2018 at 15:44, Ashutosh Bapat  wrote:
> On Thu, May 31, 2018 at 11:00 PM, MauMau  wrote:
>> 2018-05-31 22:44 GMT+09:00, Robert Haas :
>>> On Thu, May 31, 2018 at 8:12 AM, MauMau  wrote:
>>>> Oh, I didn't know you support FDW approach mainly for analytics.  I
>>>> guessed the first target was OLTP read-write scalability.
>>>
>>> That seems like a harder target to me, because you will have an extra
>>> hop involved -- SQL from the client to the first server, then via SQL
>>> to a second server.  The work of parsing and planning also has to be
>>> done twice, once for the foreign table and again for the table.  For
>>> longer-running queries this overhead doesn't matter as much, but for
>>> short-running queries it is significant.
>>
>> Yes, that extra hop and double parsing/planning were the killer for
>> our performance goal when we tried to meet our customer's scaleout
>> needs with XL.  The application executes 82 DML statements in one
>> transaction.  Those DMLs consist of INSERT, UPDATE and SELECT that
>> only accesses one row with a primary key.  The target tables are only
>> a few, so the application PREPAREs a few statements and EXECUTEs them
>> repeatedly.  We placed the coordinator node of XL on the same host as
>> the application, and data nodes and GTM on other individual nodes.
>>
>
> I agree that there's double parsing happening, but I am hesitant to
> agree with the double planning claim. We do plan, let's say a join
> between two foreign tables, on the local server, but that's only to
> decide whether it's efficient to join locally or on the foreign
> server. That means we create foreign paths for scan on the foreign
> tables, may be as many parameterized plans as the number of join
> conditions, and one path for the join pushdown that's it. We then
> create local join paths but we need those to decide whether it's
> efficient to join locally and if yes, which way. But don't create
> paths as to how the foreign server would plan that join. That's not
> double planning since we do not create same paths locally and on the
> foreign server.
>
> In order to avoid double parsing, we might want to find a way to pass
> a "normalized" parse tree down to the foreign server. We need to
> normalize the OIDs in the parse tree since those may be different
> across the nodes.

Passing detailed info between servers is exactly what XL does.

It requires us to define a cluster, exactly as XL does.

And yes, its a good idea to replicate some tables to all nodes, as XL does.

So it seems we have at last some agreement that some of the things XL
does are the correct approaches.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SHOW ALL does not honor pg_read_all_settings membership

2018-05-31 Thread Simon Riggs

On 17 April 2018 at 04:28, Michael Paquier  wrote:
> On Mon, Apr 16, 2018 at 02:32:10PM +0200, Laurenz Albe wrote:
>> Now that the dust from the last commitfest is settling, I'll make a second
>> attempt to attract attention for this small bug fix.
>>
>> The original commit was Simon's.
>
> Thanks for the ping.
>
> This was new as of v10, so this cannot be listed as an open item still I
> have added that under the section for older bugs, because you are right
> as far as I can see.

OK, agreed, its a bug.

Any objections to backpatch to v10?

> GetConfigOption is wrong by the way, as restrict_superuser means that
> all members of the group pg_read_all_settings can read
> GUC_SUPERUSER_ONLY params, and not only superusers, so the comment at
> least needs a fix, the variable ought to be renamed as well.

OK

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced

2018-05-31 Thread Simon Riggs

On 28 May 2018 at 09:57, Michael Paquier  wrote:
> On Fri, May 25, 2018 at 02:12:32PM +0200, Magnus Hagander wrote:
>> I agree that returning 0/0 on this is wrong.
>>
>> However, can this actually occour for any case other than exactly the case
>> of "moving the position to where the position already is"? If I look at the
>> physical slot path at least that seems to be the only case, and in that
>> case I think the correct thing to return would be the new position, and not
>> NULL. If we actually *fail* to move the position, we give an error.
>
> Yes, this only returns InvalidXLogRecPtr if the location could not be
> moved forward.  Still, there is more going on here.  For a physical
> slot, confirmed_lsn is always 0/0, hence the backward check is never
> applied for it.  What I think should be used for value assigned to
> startlsn is restart_lsn instead.  Then what do we do if the position
> cannot be moved: should we raise an error, as what my patch attached
> does, or just report the existing position the slot is at?

I don't see why an ERROR would be appropriate.

> A second error that I can see is in pg_logical_replication_slot_advance,
> which does not take the mutex lock of MyReplicationSlot, so concurrent
> callers like those of ReplicationSlotsComputeRequiredLSN (applies to
> physical slot actually) and pg_get_replication_slots() may read false
> data.
>
> On top of that, it seems to me that StartLogicalReplication is reading a
> couple of fields from a slot without taking a lock, so at least
> pg_get_replication_slots() may read incorrect data.
> ReplicationSlotReserveWal also is missing a lock..  Those are older than
> v11 though.
>
>> Actually, isn't there also a race there? That is, if we try to move it, we
>> check that we're not trying to move it backwards, and throw an error, but
>> that's checked outside the lock. Then later we actually move it, and check
>> *again* if we try to move it backwards, but if that one fails we return
>> InvalidXLogRecPtr (which can happen in the case of concurrent activity on
>> the slot, I think)? In this case, maybe we should just re-check that and
>> raise an error appropriately?
>
> Er, isn't the take here that ReplicationSlotAcquire is used to lock any
> replication slot update to happen from other backends?  It seems to me
> that what counts at the end if if a backend PID is associated to a slot
> in memory.  If you look at the code paths updating a logical or physical
> slot then those imply that the slot is owned, still a spin lock needs to
> be taken for concurrent readers.
>
>> (I haven't looked at the logical slot path, but I assume it would have
>> something similar in it)
>
> Found one.  All the things I have spotted are in the patch attached.

I think the problem here is there are no comments explaining how to
access the various fields in the structure, so there was no way to
check whether the code was good or not.

If we add corrective code we should clarify that in comments the .h
file also, as is done in XlogCtl

Your points look correct to me, well spotted. I'd like to separate the
correction of these issues from the change of behavior patch. Those
earlier issues can be backpatched, but the change of behavior only
affects PG11.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Undo logs

2018-05-28 Thread Simon Riggs

On 24 May 2018 at 23:22, Thomas Munro  wrote:

> As announced elsewhere[1][2][3], at EnterpriseDB we are working on a
> proposal to add in-place updates with undo logs to PostgreSQL.  The
> goal is to improve performance and resource usage by recycling space
> better.

Cool

> The lowest level piece of this work is a physical undo log manager,

> 1.  Efficient appending of new undo data from many concurrent
> backends.  Like logs.
> 2.  Efficient discarding of old undo data that isn't needed anymore.
> Like queues.
> 3.  Efficient buffered random reading of undo data.  Like relations.

Like an SLRU?

> [4] 
> https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/access/undo
> [5] 
> https://github.com/EnterpriseDB/zheap/tree/undo-log-storage/src/backend/storage/smgr

I think there are quite a few design decisions there that need to be
discussed, so lets crack on and discuss them please.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_replication_slot_advance to return NULL instead of 0/0 if slot not advanced

2018-05-25 Thread Simon Riggs

On 25 May 2018 at 13:12, Magnus Hagander  wrote:
>
>
> On Fri, May 25, 2018 at 7:28 AM, Michael Paquier 
> wrote:
>>
>> Hi all,
>>
>> When attempting to use multiple times pg_replication_slot_advance on a
>> slot, then the caller gets back directly InvalidXLogRecPtr as result,
>> for example:
>> =# select * from pg_replication_slot_advance('popo', 'FF/0');
>>  slot_name |  end_lsn
>> ---+---
>>  popo  | 0/60021E0
>> (1 row)
>> =# select * from pg_replication_slot_advance('popo', 'FF/0');
>>  slot_name | end_lsn
>> ---+-
>>  popo  | 0/0
>> (1 row)
>>
>> Wouldn't it be more simple to return NULL to mean that the slot could
>> not be moved forward?  That would be easier to parse for clients.
>> Please see the attached.
>
>
> I agree that returning 0/0 on this is wrong.

Agreed

> However, can this actually occour for any case other than exactly the case
> of "moving the position to where the position already is"? If I look at the
> physical slot path at least that seems to eb the only case, and in that case
> I think the correct thing to return would be the new position, and not NULL.

Docs say
"Returns name of the slot and real position to which it was advanced to."

so agreed

> If we actually *fail* to move the position, we give an error.
>
> Actually, isn't there also a race there? That is, if we try to move it, we
> check that we're not trying to move it backwards, and throw an error, but
> that's checked outside the lock. Then later we actually move it, and check
> *again* if we try to move it backwards, but if that one fails we return
> InvalidXLogRecPtr (which can happen in the case of concurrent activity on
> the slot, I think)? In this case, maybe we should just re-check that and
> raise an error appropriately?

Agreed

> (I haven't looked at the logical slot path, but I assume it would have
> something similar in it)

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PG 11 feature count

2018-05-18 Thread Simon Riggs

On 17 May 2018 at 18:29, Bruce Momjian  wrote:
> I regularly track the number of items documented in each major release.
> I use the attached script.  You might be surprised to learn that PG 11
> has the lowest feature count of any release back through 7.4:
>
> 7.4 280
> 8.0 238
> 8.1 187
> 8.2 230
> 8.3 237
> 8.4 330
> 9.0 252
> 9.1 213
> 9.2 250
> 9.3 187
> 9.4 217
> 9.5 200
> 9.6 220
> 10  194
> 11  167

It would be useful to combine that with the CF app data showing number
of patches submitted and number of features rejected. Not available
for all time, but certainly goes back a few years now.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Removing unneeded self joins

2018-05-16 Thread Simon Riggs

On 16 May 2018 at 15:10, Tom Lane  wrote:
> Simon Riggs  writes:
>> What I would add is that I've seen cases where the extra joins do NOT
>> hurt performance, so the extra CPU used to remove the join hurts more
>> than the benefit of removing it. Yes, we tried it.
>
> Interesting.  The concern I had was more about the cost imposed on every
> query to detect self-joins and try to prove them useless, even in queries
> where no benefit ensues.  It's possible that we can get that down to the
> point where it's negligible; but this says that even the successful-proof
> case has to be very cheap.

What I was advocating was an approach that varies according to the
query cost, so we don't waste time trying to tune the heck out of OLTP
queries, but for larger queries we might take a more considered
approach.

For advanced optimizations that are costly to check for, skip the
check if we are already below a cost threshold. The threshold would be
a heuristic that varies according to the cost of the check.

I realise that in this case we wouldn't know the full query cost until
we've done join planning, so we would need some lower bound estimate
to check whether its worth trying to remove joins.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Removing unneeded self joins

2018-05-16 Thread Simon Riggs

On 16 May 2018 at 11:26, Robert Haas  wrote:
> On Wed, May 16, 2018 at 12:08 PM, Tom Lane  wrote:
>> Alexander Kuzmenkov  writes:
>>> There is a join optimization we don't do -- removing inner join of a
>>> table with itself on a unique column. Such joins are generated by
>>> various ORMs, so from time to time our customers ask us to look into
>>> this. Most recently, it was discussed on the list in relation to an
>>> article comparing the optimizations that some DBMS make [1].
>>
>> This is the sort of thing that I always wonder why the customers don't
>> ask the ORM to stop generating such damfool queries.  Its *expensive*
>> for us to clean up after their stupidity; almost certainly, it would
>> take far fewer cycles, net, for them to be a bit smarter in the first
>> place.
>
> The trouble, of course, is that the customer didn't write the ORM,
> likely has no idea how it works, and doesn't want to run a modified
> version of it even if they do.  If the queries run faster on other
> systems than they do on PostgreSQL, we get dinged -- not unjustly.
>
> Also, I'm not sure that I believe that it's always easy to avoid
> generating such queries.  I mean, this case is trivial so it's easy to
> say, well, just rewrite the query.  But suppose that I have a fact
> table over which I've created two views, each of which performs
> various joins between the fact table and various lookup tables.  My
> queries are such that I normally need the joins in just one of these
> two views and not the other to fetch the information I care about.
> But every once in a while I need to run a report that involves pulling
> every column possible.  The obvious solution is to join the views on
> the underlying table's primary key, but then you get this problem.  Of
> course there's a workaround: define a third view that does both sets
> of joins-to-lookup-tables.  But that starts to feel like you're
> handholding the database; surely it's the database's job to optimize
> queries, not the user's.
>
> It's been about 10 years since I worked as a web developer, but I do
> remember hitting this kind of problem from time to time and I'd really
> like to see us do something about it.  I wish we could optimize away
> inner joins, too, for similar reasons.

I agree with everything you say.

What I would add is that I've seen cases where the extra joins do NOT
hurt performance, so the extra CPU used to remove the join hurts more
than the benefit of removing it. Yes, we tried it.

More advanced optimizations should only be applied when we've assessed
that the likely run time is high enough to make it worth investing in
further optimization.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Surjective functional indexes

2018-05-11 Thread Simon Riggs

On 11 May 2018 at 16:37, Andres Freund  wrote:
> On 2018-05-11 14:56:12 +0200, Simon Riggs wrote:
>> On 11 May 2018 at 05:32, Andres Freund  wrote:
>> > No. Simon just claimed it's not actually a concern:
>> > https://www.postgresql.org/message-id/canp8+j+vtskphep_gmqmeqdwakst2kbotee0yz-my+agh0a...@mail.gmail.com
>> >
>> > And yes, it got committed without doing squat to address the
>> > architectural concerns.
>>
>> "Squat" means "zero, nothing" to me. So that comment would be inaccurate.
>
> Yes, I know it means that. And I don't see how it is inaccurate.
>
>
>> I have no problem if you want to replace this with an even better
>> design in a later release.
>
> Meh. The author / committer should get a patch into the right shape

They have done, at length. Claiming otherwise is just trash talk.

As you pointed out, the design of the patch avoids layering violations
that could have led to architectural objections. Are you saying I
should have ignored your words and rewritten the patch to introduce a
layering violation? What other objection do you think has not been
addressed?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Surjective functional indexes

2018-05-11 Thread Simon Riggs

On 11 May 2018 at 05:32, Andres Freund  wrote:
> On 2018-05-10 23:25:58 -0400, Robert Haas wrote:
>> On Thu, Mar 1, 2018 at 2:48 PM, Andres Freund  wrote:
>> > I still don't think, as commented upon by Tom and me upthread, that we
>> > want this feature in the current form.
>>
>> Was this concern ever addressed, or did the patch just get committed anyway?
>
> No. Simon just claimed it's not actually a concern:
> https://www.postgresql.org/message-id/canp8+j+vtskphep_gmqmeqdwakst2kbotee0yz-my+agh0a...@mail.gmail.com
>
> And yes, it got committed without doing squat to address the
> architectural concerns.

"Squat" means "zero, nothing" to me. So that comment would be inaccurate.

I've spent a fair amount of time reviewing and grooming the patch, and
I am happy that Konstantin has changed his patch significantly in
response to those reviews, as well as explaining why the patch is fine
as it is. It's a useful patch that PostgreSQL needs, so all good.

I have no problem if you want to replace this with an even better
design in a later release.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexes on partitioned tables and foreign partitions

2018-05-11 Thread Simon Riggs

On 9 May 2018 at 17:33, Robert Haas  wrote:
> On Wed, May 9, 2018 at 11:20 AM, Simon Riggs  wrote:
>> On 9 May 2018 at 16:15, Robert Haas  wrote:
>>> On Wed, May 9, 2018 at 11:14 AM, Simon Riggs  wrote:
>>>> On 9 May 2018 at 16:10, Tom Lane  wrote:
>>>>> Robert Haas  writes:
>>>>>> On Wed, May 9, 2018 at 9:08 AM, Simon Riggs  
>>>>>> wrote:
>>>>>>> Shouldn't the fix be to allow creation of indexes on foreign tables?
>>>>>>> (Maybe they would be virtual or foreign indexes??)
>>>>>
>>>>>> It might be useful to invent the concept of a foreign index, but not
>>>>>> for v11 a month after feature freeze.
>>>>>
>>>>> Yeah.  That's a can of worms we can *not* open at this stage.
>>>>
>>>> Lucky nobody suggested that then, eh? Robert's just making a joke.
>>>
>>> Someone did suggest that.  It was you.
>>
>> Oh, you weren't joking. I think we're having serious problems with
>> people putting words in my mouth again then.
>>
>> Please show me where I suggested doing anything for v11?
>
> Come on, Simon.  It's in the quoted text.

No, its not.

>  I realize you didn't say
> v11 specifically, but this is the context of a patch that is proposed
> a bug-fix for v11.  If you meant that we should apply the patch as
> proposed now, or some other one, and do the other thing later, you
> could have said so.

I think its reasonable to expect you interpret my words sensibly,
rather than in some more dramatic form where I seem to break rules
with every phrase.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Needless additional partition check in INSERT?

2018-05-09 Thread Simon Riggs

On 10 May 2018 at 05:33, David Rowley  wrote:
> On 10 May 2018 at 16:13, Amit Langote  wrote:
>> The patch to ExecInsert looks good, but I think we also need to do the
>> same thing in CopyFrom.
>
> I think so too.
>
> Updated patch attached.

Patch is good.

The cause of this oversight is the lack of comments to explain the
original coding, so we need to correct that in this patch, please.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexes on partitioned tables and foreign partitions

2018-05-09 Thread Simon Riggs

On 9 May 2018 at 16:15, Robert Haas  wrote:
> On Wed, May 9, 2018 at 11:14 AM, Simon Riggs  wrote:
>> On 9 May 2018 at 16:10, Tom Lane  wrote:
>>> Robert Haas  writes:
>>>> On Wed, May 9, 2018 at 9:08 AM, Simon Riggs  wrote:
>>>>> Shouldn't the fix be to allow creation of indexes on foreign tables?
>>>>> (Maybe they would be virtual or foreign indexes??)
>>>
>>>> It might be useful to invent the concept of a foreign index, but not
>>>> for v11 a month after feature freeze.
>>>
>>> Yeah.  That's a can of worms we can *not* open at this stage.
>>
>> Lucky nobody suggested that then, eh? Robert's just making a joke.
>
> Someone did suggest that.  It was you.

Oh, you weren't joking. I think we're having serious problems with
people putting words in my mouth again then.

Please show me where I suggested doing anything for v11?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexes on partitioned tables and foreign partitions

2018-05-09 Thread Simon Riggs

On 9 May 2018 at 16:10, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, May 9, 2018 at 9:08 AM, Simon Riggs  wrote:
>>> Shouldn't the fix be to allow creation of indexes on foreign tables?
>>> (Maybe they would be virtual or foreign indexes??)
>
>> It might be useful to invent the concept of a foreign index, but not
>> for v11 a month after feature freeze.
>
> Yeah.  That's a can of worms we can *not* open at this stage.

Lucky nobody suggested that then, eh? Robert's just making a joke.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexes on partitioned tables and foreign partitions

2018-05-09 Thread Simon Riggs

On 9 May 2018 at 15:57, Robert Haas  wrote:

> For right now, I think the options are (1) throw an ERROR if we
> encounter a foreign table or (2) silently skip the foreign table.  I
> think (2) is defensible for non-UNIQUE indexes, because the index is
> just a performance optimization. However, for UNIQUE indexes, at
> least, it seems like we'd better do (1), because a major point of such
> an index is to enforce a constraint; we can't allege that we have such
> a constraint if foreign tables are just silently skipped.

If we can assume an index exists on a foreign table, why can we not
just assume a unique index exists?? Why the difference?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexes on partitioned tables and foreign partitions

2018-05-09 Thread Simon Riggs

On 9 May 2018 at 15:26, Arseny Sher  wrote:
>
> Simon Riggs  writes:
>
>> How much sense is it to have a partitioned table with a mix of local
>> and foreign tables?
>
> Well, as much sense as fdw-based sharding has, for instance. It is
> arguable, but it exists.
>
>> Shouldn't the fix be to allow creation of indexes on foreign tables?
>> (Maybe they would be virtual or foreign indexes??)
>
> Similar ideas were discussed at [1]. There was no wide consensus of even
> what problems such feature would solve. Since currently indexes on
> foreign tables are just forbidden, it seems to me that the best what
> partitioning code can do today is just not creating them.
>
> [1] 
> https://www.postgresql.org/message-id/flat/4F62FD69.2060007%40lab.ntt.co.jp#4f62fd69.2060...@lab.ntt.co.jp

Indexes on foreign tables cause an ERROR, so yes, we already just
don't create them.

You're suggesting silently skipping the ERROR. I can't see a reason for that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Indexes on partitioned tables and foreign partitions

2018-05-09 Thread Simon Riggs

On 9 May 2018 at 12:50, Arseny Sher  wrote:
> Hi,
>
> 8b08f7d4 added propagation of indexes on partitioned tables to
> partitions, which is very cool. However, index creation also recurses
> down to foreign tables. I doubt this is intentional, as such indexes are
> forbidden as not making much sense; attempt to create index on
> partitioned table with foreign partition leads to an error
> now. Attached lines fix this.

"Fix"?

How much sense is it to have a partitioned table with a mix of local
and foreign tables?

Shouldn't the fix be to allow creation of indexes on foreign tables?
(Maybe they would be virtual or foreign indexes??)

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Simon Riggs

On 28 April 2018 at 09:15, Andres Freund  wrote:
> Hi,
>
> On 2018-04-28 08:25:53 -0700, Simon Riggs wrote:
>> > - Use direct IO. Due to architectural performance issues in PG and the
>> >   fact that it'd not be applicable for all installations I don't think
>> >   this is a reasonable fix for the issue presented here. Although it's
>> >   independently something we should work on.  It might be worthwhile to
>> >   provide a configuration that allows to force DIO to be enabled for WAL
>> >   even if replication is turned on.
>>
>> "Use DirectIO" is roughly same suggestion as "don't trust Linux filesystems".
>
> I want to emphasize that this is NOT a linux only issue. It's a problem
> across a number of operating systems, including linux.

Yes, of course.

>> It would be a major admission of defeat for us to take that as our
>> main route to a solution.
>
> Well, I think we were wrong to not engineer towards DIO. There's just
> too many issues with buffered IO to not have a supported path for
> DIO. But given that it's unrealistic to do so without major work, and
> wouldn't be applicable for all installations (shared_buffer size becomes
> critical), I don't think it matters that much for the issue discussed
> here.
>
>
>> The people I've spoken to so far have encouraged us to continue
>> working with the filesystem layer, offering encouragement of our
>> decision to use filesystems.
>
> There's a lot of people disagreeing with it too.

Specific recent verbal feedback from OpenLDAP was that the project
adopted DIO and found no benefit in doing so, with regret the other
way from having tried.

The care we need to use for any technique is the same.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-29 Thread Simon Riggs

On 28 April 2018 at 08:25, Simon Riggs  wrote:
> On 27 April 2018 at 15:28, Andres Freund  wrote:
>
>> - Add a pre-checkpoint hook that checks for filesystem errors *after*
>>   fsyncing all the files, but *before* logging the checkpoint completion
>>   record. Operating systems, filesystems, etc. all log the error format
>>   differently, but for larger installations it'd not be too hard to
>>   write code that checks their specific configuration.
>>
>>   While I'm a bit concerned adding user-code before a checkpoint, if
>>   we'd do it as a shell command it seems pretty reasonable. And useful
>>   even without concern for the fsync issue itself. Checking for IO
>>   errors could e.g. also include checking for read errors - it'd not be
>>   unreasonable to not want to complete a checkpoint if there'd been any
>>   media errors.
>
> It seems clear that we need to evaluate our compatibility not just
> with an OS, as we do now, but with an OS/filesystem.
>
> Although people have suggested some approaches, I'm more interested in
> discovering how we can be certain we got it right.
>
> And the end result seems to be that PostgreSQL will be forced, in the
> short term, to declare certain combinations of OS/filesystem
> unsupported, with clear warning sent out to users.
>
> Adding a pre-checkpoint hook encourages people to fix this themselves
> without reporting issues, so I initially oppose this until we have a
> clearer argument as to why we need it. The answer is not to make this
> issue more obscure, but to make it more public.

Thinking some more, I think I understand, but please explain if not.

We need behavior that varies according to OS and filesystem, which
varies per tablespace.

We could have that variable behavior using

* a hook

* a set of GUC parameters that can be set at tablespace level

* a separate config file for each tablespace

My preference would be to avoid a hook.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Postgres, fsync, and OSs (specifically linux)

2018-04-28 Thread Simon Riggs

On 27 April 2018 at 15:28, Andres Freund  wrote:

> - Add a pre-checkpoint hook that checks for filesystem errors *after*
>   fsyncing all the files, but *before* logging the checkpoint completion
>   record. Operating systems, filesystems, etc. all log the error format
>   differently, but for larger installations it'd not be too hard to
>   write code that checks their specific configuration.
>
>   While I'm a bit concerned adding user-code before a checkpoint, if
>   we'd do it as a shell command it seems pretty reasonable. And useful
>   even without concern for the fsync issue itself. Checking for IO
>   errors could e.g. also include checking for read errors - it'd not be
>   unreasonable to not want to complete a checkpoint if there'd been any
>   media errors.

It seems clear that we need to evaluate our compatibility not just
with an OS, as we do now, but with an OS/filesystem.

Although people have suggested some approaches, I'm more interested in
discovering how we can be certain we got it right.

And the end result seems to be that PostgreSQL will be forced, in the
short term, to declare certain combinations of OS/filesystem
unsupported, with clear warning sent out to users.

Adding a pre-checkpoint hook encourages people to fix this themselves
without reporting issues, so I initially oppose this until we have a
clearer argument as to why we need it. The answer is not to make this
issue more obscure, but to make it more public.

> - Use direct IO. Due to architectural performance issues in PG and the
>   fact that it'd not be applicable for all installations I don't think
>   this is a reasonable fix for the issue presented here. Although it's
>   independently something we should work on.  It might be worthwhile to
>   provide a configuration that allows to force DIO to be enabled for WAL
>   even if replication is turned on.

"Use DirectIO" is roughly same suggestion as "don't trust Linux filesystems".

It would be a major admission of defeat for us to take that as our
main route to a solution.

The people I've spoken to so far have encouraged us to continue
working with the filesystem layer, offering encouragement of our
decision to use filesystems.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Query is over 2x slower with jit=on

2018-04-18 Thread Simon Riggs

On 18 April 2018 at 16:50, Andres Freund  wrote:
> On 2018-04-18 17:35:31 +0200, Andreas Joseph Krogh wrote:
>> With jit=on:
>> https://explain.depesz.com/s/vYB
>> Planning Time: 0.336 ms
>>  JIT:
>>   Functions: 716
>>   Generation Time: 78.404 ms
>>   Inlining: false
>>   Inlining Time: 0.000 ms
>>   Optimization: false
>>   Optimization Time: 43.916 ms
>>   Emission Time: 600.031 ms
>
> Any chance this is a debug LLVM build?
>
>
>> What's the deal with jit making it slower?
>
> JIT has cost, and sometimes it's not beneficial. Here our heuristics
> when to JIT appear to be a bit off. In the parallel world it's worse
> because the JITing is duplicated for parallel workers atm.

Please change the name of the "JIT" parameter to something meaningful
to humans before this gets too far into the wild.

SSL is somewhat understandable because its not a Postgres-private term.

geqo is regrettable and we really don't want any more too
short/abbreviated parameter names.

Think of our EOU if every GUC was a TLA.

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2018-04-18 Thread Simon Riggs

On 17 April 2018 at 20:09, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Andres was working on a radix tree structure to fix this problem, but
>> that seems to be abandoned now, and it seems a major undertaking.  While
>> I agree that the proposed solution is a wart, it seems much better than
>> no solution at all.  Can we consider Fujii's proposal as a temporary
>> measure until we fix shared buffers?  I'm +1 on it myself.
>
> Once we've introduced a user-visible reloption it's going to be
> practically impossible to get rid of it, so I'm -1.  I'd much rather
> see somebody put some effort into the radix-tree idea than introduce
> a kluge that we'll be stuck with, and that doesn't even provide a
> good user experience.  Disabling vacuum truncation is *not* something
> that I think we should recommend.

The truncation at the end of VACUUM takes an AccessExclusiveLock,
which is already user visible. Using a radix tree won't alter that.

ISTM the user might be interested in having the *lock* NOT happen, so
I am +1 for the suggestion regardless of whether radix tree ever
happens.

The lock itself can be cancelled, so the user would also be interested
in explicitly requesting a retry with a separate command/function.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Simon Riggs

On 11 April 2018 at 19:57, Tom Lane  wrote:
> Pavan Deolasee  writes:
>> Ok. I propose attached patches, more polished this time.
>
> I'll take these, unless some other committer is hot to do so?

Please go ahead.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: New files for MERGE

2018-04-11 Thread Simon Riggs

On 7 April 2018 at 18:45, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 6 April 2018 at 17:22, Bruce Momjian  wrote:
>>> My point was that people didn't ask you to work harder on fixing the
>>> patch, but in reverting it.  You can work harder on fixing things in the
>>> hope they change their minds, but again, that isn't addressing their
>>> request.
>
>> If Tom or Andres still feel that their concerns have not been
>> addressed over the last few days, I am happy to revert the patch with
>> no further discussion from me in this cycle.
>
> FWIW, I still vote to revert.  Even if the patch were now perfect,
> there is not time for people to satisfy themselves of that, and
> we've got lots of other things on our plates.
>
> I'd be glad to participate in a proper review of this when v12
> opens.  But right now it just seems too rushed, and I have little
> confidence in it being right.
>
> regards, tom lane
>
> PS: If you do revert, please wrap it up as a single revert commit,
> not a series of half a dozen.  You've already put several
> non-buildable states into the commit history as a result of this
> patch, each one of which is a land mine for git bisect testing.
> We don't need more of those.  Also, looking at the reverse of the
> reversion commit will provide a handy way of seeing the starting
> point for future discussion of this patch.

Will do.



"Commence primary ignition."

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: New files for MERGE

2018-04-07 Thread Simon Riggs

On 6 April 2018 at 17:22, Bruce Momjian  wrote:
> On Fri, Apr  6, 2018 at 09:21:54AM +0100, Simon Riggs wrote:
>> On 5 April 2018 at 21:02, Bruce Momjian  wrote:
>> > Simon, you have three committers in this thread suggesting this patch be
>> > reverted.  Are you just going to barrel ahead with the fixes without
>> > addressing their emails?
>>
>> PeterG confirms that the patch works and has the agreed concurrency
>> semantics. Separating out the code allows us to see clearly that we
>> have almost complete test coverage of the code and its features.
>
> My point was that people didn't ask you to work harder on fixing the
> patch, but in reverting it.  You can work harder on fixing things in the
> hope they change their minds, but again, that isn't addressing their
> request.

I had understood Tom's post to be raising a discussion about whether to revert.

In my understanding, Tom's complaints were addressed quickly, against
what he expected - I was surprised myself at how quickly Pavan was
able to address them.

That left Andres' complaints, which as I explained hadn't been given
in any detail. Once given, Pavan investigated Andres' complaints and
explained in detail the results of his work and that he doesn't see
how the proposed changes would improve anything.

If there was anything unresolvable before commit I wouldn't have
committed it, or unresolvable after commit I would already have
reverted it.

And as I explained, this commit was not rushed and various other
negative points made are unfortunately not correct.

Now I can see some people are annoyed, so I'm happy to apologize if
I've done things that weren't understood or caused annoyance. Time is
short at end of CF and tempers fray for us all.

If Tom or Andres still feel that their concerns have not been
addressed over the last few days, I am happy to revert the patch with
no further discussion from me in this cycle.

If request to revert occurs, I propose to do this on Wed 11 pm, to
avoid getting in the way of other commits and because it will take
some time to prepare to revert.

Thanks to everyone for review comments and well done to Pavan,
whatever we decide.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add support for printing/reading MergeAction nodes

2018-04-06 Thread Simon Riggs

On 4 April 2018 at 18:08, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 4 April 2018 at 17:19, Tom Lane  wrote:
>>> BTW, poking around in the grammar, I notice that MergeStmt did not
>>> get added to RuleActionStmt.  That seems like a rather serious
>>> omission.
>
>> MERGE isn't a privilege, a trigger action or a policy action. Why
>> would it be in RuleActionStmt?
>
> Because it seems likely that somebody would want to write a rule along
> the lines of "ON UPDATE TO mytable DO INSTEAD MERGE ...".
>
> Looking a little further ahead, one can easily envision somebody
> wanting to do "ON MERGE TO mytable DO INSTEAD something".  I'd be
> prepared to give a pass on that for the present, partly because
> it's not very clear what stuff from the original MERGE needs to be
> available to the rule.  But the other case seems pretty fundamental.
> MERGE is not supposed to be a utility command IMO, it's supposed to
> be DML, and that means it needs to work anywhere that you could
> write e.g. UPDATE.

MERGE is important because it is implemented by other databases,
making it an application portability issue and an important feature
for PostgreSQL adoption with real users.

Enhancing Rules to allow interoperation with MERGE doesn't fall into
that same category, so I don't see it needs to work anywhere you can
write UPDATE - that certainly isn't the case with triggers, row level
security policies or privileges.

With that said, we can still discuss it to see if it's possible.

... ON UPDATE TO foo DO INSTEAD MERGE...
would look like this
MERGE INTO foo USING (what?) ON (join condition) WHEN MATCHED THEN UPDATE
which if we managed to achieve that is simply a much poorer version of
UPDATE, since MERGE with a single WHEN clause is semantically similar
but higher overhead than a single DML operation. So if we implemented
it you wouldn't ever use it.

Achieving the marriage between rules and merge is made complex in the
source and join condition. MERGE is different from other DML in that
it always has two tables, so is hard to see how that work with rules
on just one table.

So that leaves the thought of could we do a more complex version with
some kind of exploitation of multi-DML features? Well possibly, but I
think its months of work. Adding this requested feature doesn't
enhance the original goal of application portability and standards
compliance, so there's no way to know when you've got it right.
Without a scope or a spec it would be difficult to see where that
would go.

I would be most glad to be proved wrong to allow us to implement
something here and I have no objection to others adding that, though
it seems clear that can't happen in this release.

For this release, in discussion with Stephen and others at the Dev
Meeting in Brussels the addition of features for RLS and partitioning
were decided as the priorities for MERGE ahead of other items. This
item wasn't mentioned by anyone before and not at all by any users
I've spoken to.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-06 Thread Simon Riggs

On 5 April 2018 at 21:00, Andres Freund  wrote:

> And if somebody cleans it up Simon will
> complain that things are needlessly being destabilized (hello xlog.c
> with it's 1000+ LOC functions).

Taking this comment as a special point... with other points addressed
separately.

...If anybody wants to know what I think they can just ask...

There are various parts of the code that don't have full test
coverage. Changing things there is more dangerous and if its being
done for no reason then that is risk for little reward. That doesn't
block it, but it does make me think that people could spend their time
better - and issue that concerns me also.

But that certainly doesn't apply to parts of the code like this where
we have full test coverage.

It may not even apply to recovery now we have the ability to check in
real-time the results of recovery and replication.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: New files for MERGE

2018-04-06 Thread Simon Riggs

On 5 April 2018 at 21:02, Bruce Momjian  wrote:
> On Thu, Apr  5, 2018 at 11:15:20AM +0100, Simon Riggs wrote:
>> On 4 April 2018 at 21:28, Simon Riggs  wrote:
>> > On 4 April 2018 at 21:14, Andres Freund  wrote:
>> >
>> >>> The normal way is to make review comments that allow change. Your
>> >>> request for change of the parser data structures is fine and can be
>> >>> done, possibly by Saturday
>> >>
>> >> I did request changes, and you've so far ignored those requests.
>> >
>> > Pavan tells me he has replied to you and is working on specific changes.
>>
>> Specific changes requested have now been implemented by Pavan and
>> committed by me.
>>
>> My understanding is that he is working on a patch for Tom's requested
>> parser changes, will post on other thread.
>
> Simon, you have three committers in this thread suggesting this patch be
> reverted.  Are you just going to barrel ahead with the fixes without
> addressing their emails?

PeterG confirms that the patch works and has the agreed concurrency
semantics. Separating out the code allows us to see clearly that we
have almost complete test coverage of the code and its features.

The fixes being applied are ones proposed by Andres, Tom, Andrew,
Jesper. So their emails are being addressed in full, where there is
anything to discuss and change. Yes, that work is ongoing since there
is a CF deadline. So Pavan and I are very clearly and publicly
addressing their emails and nobody is being ignored.

The architecture of the executor has been described as wrong, but at
the time that was made there was zero detail behind that claim, so
there was nothing to comment upon. I'm surprised and somewhat
suspicious that multiple people found anything with which to agree or
disagree with that suggestion; I'm also suprised that nobody else has
pointed out the lack of substance there. Given that the executor
manifestly works and has been re-engineered according to PeterG's
requests and that many performance concerns have already been
addressed prior to commit, Pavan and I were happy with it. My proposal
to commit the patch was given 5 days ahead of time and no comments
were received by anyone, not even PeterG. There was no rush and I
personally performed extensive reviews before final commit. And as
Robert has reminded me often that committers do not possess a veto
that they can simply use to remove patches, there have been no
reasonable grounds to revoke anything.

I do have sympathy with the need for good architecture and executor
design. I expressed exactly the same last year with the missing
executor elements in the partitioning patch. Notably nobody asked for
that to be revoked, not even myself knowing that the executor would
require major changes in PG11. The executor for MERGE is in radically
better shape.

I see that Andres has now posted something giving some details about
his concerns (posted last night, so Pavan and I are only now reading
it). AFAICS these are not blocking design issues, simply requests for
some moving around of the code. We will of course answer in detail on
that thread and discuss further. Given we have full test coverage it
is reasonable to imagine that can be done relatively quickly.

Given the extreme late notice of those review requests, the relative
separation of the MERGE code and the fact this is a new feature not
related to robustness, it seems reasonable to be granted an extension
to complete changes. We're currently assessing how long that might be,
but an additional week seems likely.

The project is blessed with many great developers; I do not claim to
be one myself but I am a diligent committer. It's been said that
neither Tom or Andres thought the code would be committed so neither
looked at this code, even though they both knew of my proposed commit
of the feature in January, with some caveats at that time. Obviously,
when great people see a new thing they will always see ways of doing
it better, but that doesn't mean things are below project standards.
It would be useful to have a way for all committers to signal ahead of
time what they think is likely to happen to avoid this confusion,
rather than a few private IMs between friends. I've been surprised
before by people saying "that's not going in" when they clearly
haven't even looked at the code and yet others think it is all good.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 17:07, Alvaro Herrera  wrote:
> Simon Riggs wrote:
>> On 5 April 2018 at 16:09, Alvaro Herrera  wrote:
>> > Quick item: parse_clause.h fails cpluspluscheck because it has a C++
>> > keyword as a function argument name:
>> >
>> > ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ 
>> > before ‘namespace’
>> >List **namespace);
>> >       ^
>>
>> How's this as a fix?
>
> WFM

Pushed

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 16:09, Alvaro Herrera  wrote:
> Quick item: parse_clause.h fails cpluspluscheck because it has a C++
> keyword as a function argument name:
>
> ./src/include/parser/parse_clause.h:26:14: error: expected ‘,’ or ‘...’ 
> before ‘namespace’
>List **namespace);
>   ^

How's this as a fix?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fnamespace.v1.patch
Description: Binary data

Re: Excessive PostmasterIsAlive calls slow down WAL redo

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 08:23, Heikki Linnakangas  wrote:

> That seems like an utter waste of time. I'm almost inclined to call that a
> performance bug. As a straightforward fix, I'd suggest that we call
> HandleStartupProcInterrupts() in the WAL redo loop, not on every record, but
> only e.g. every 32 records. That would make the main redo loop less
> responsive to shutdown, SIGHUP, or postmaster death, but that seems OK.
> There are also calls to HandleStartupProcInterrupts() in the various other
> loops, that wait for new WAL to arrive or recovery delay, so this would only
> affect the case where we're actively replaying records.

+1

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 13:18, Teodor Sigaev  wrote:
>> The variable would become unused in non-assert builds. I see that. But
>> simply removing it is not a solution and I don't think the code will compile
>> that way. We should either rewrite that assertion or put it inside a #ifdef
>> ASSERT_CHECKING block or simple remove that assertion because we already
>> check for relkind in parse_merge.c. Will check.
>>
>
> That is noted by Kyotaro HORIGUCHI
> https://www.postgresql.org/message-id/20180405.181730.125855581.horiguchi.kyotaro%40lab.ntt.co.jp
>
> and his suggestion to use special macro looks better for me:
> -   charrelkind;
> +   charrelkind PG_USED_FOR_ASSERTS_ONLY;

Thanks both, I already fixed that.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 13:19, Jesper Pedersen  wrote:
> Hi,
>
> On 04/05/2018 08:04 AM, Simon Riggs wrote:
>>
>> On 5 April 2018 at 12:56, Jesper Pedersen 
>> wrote:
>>>
>>> Updated for non-assert build.
>>
>>
>> Thanks, pushed. Sorry to have you wait til v3
>>
>
> That patch was a but rushed, and cut off too much.

Yes, noted, already fixed. Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 12:56, Jesper Pedersen  wrote:
> Hi,
>
> On 04/05/2018 07:48 AM, Simon Riggs wrote:
>>>
>>> Updated version due to latest refactoring.
>>
>>
>> Thanks for your input. Removing that seems to prevent compilation.
>>
>> Did something change in between?
>>
>
> Updated for non-assert build.

Thanks, pushed. Sorry to have you wait til v3

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 12:38, Jesper Pedersen  wrote:
> Hi Simon and Paven,
>
> On 04/04/2018 08:46 AM, Jesper Pedersen wrote:
>>
>> On 03/30/2018 07:10 AM, Simon Riggs wrote:
>>>
>>> No problems found, but moving proposed commit to 2 April pm
>>>
>>
>> There is a warning for this, as attached.
>>
>
> Updated version due to latest refactoring.

Thanks for your input. Removing that seems to prevent compilation.

Did something change in between?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add support for printing/reading MergeAction nodes

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 11:31, Pavan Deolasee  wrote:

> Attached patch refactors the grammar/parser side per your comments. We no
> longer use InsertStmt/UpdateStmt/DeleteStmt/SelectStmt as part of
> MergeAction. Instead we only collect the necessary information for running
> the INSERT/UPDATE/DELETE actions. Speaking of MergeAction itself, I decided
> to use a new parser-only node named MergeWhenClause and removed unnecessary
> members from the MergeAction node which now gets to planner/executor.

That looks good to me. Simply separation of duty.

> Regarding the original report by Marina I suspect she may have turned
> debug_print_parse=on while running regression. I could reproduce the
> failures in the isolation tests by doing same. The attached patch however
> passes all tests with the following additional GUCs. So I am more confident
> that we should have got the outfuncs.c support ok.
>
> debug_print_parse=on
> debug_print_rewritten=on
> debug_print_plan=on
>
> Also, I am now running tests with -DCOPY_PARSE_PLAN_TREES
> -DRAW_EXPRESSION_COVERAGE_TEST since the buildfarm had earlier uncovered
> some issues with those flags. No problem there too.

OK, so $OP fixed.

> This now also enforces single VALUES clause in the grammar itself instead of
> doing that check at parse-analyse time. So that's a net improvement too.

OK, that's good. I've updated the docs to show this restriction correctly.

I'll commit this tomorrow morning unless further comments or edits.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: New files for MERGE

2018-04-05 Thread Simon Riggs

On 4 April 2018 at 21:28, Simon Riggs  wrote:
> On 4 April 2018 at 21:14, Andres Freund  wrote:
>
>>> The normal way is to make review comments that allow change. Your
>>> request for change of the parser data structures is fine and can be
>>> done, possibly by Saturday
>>
>> I did request changes, and you've so far ignored those requests.
>
> Pavan tells me he has replied to you and is working on specific changes.

Specific changes requested have now been implemented by Pavan and
committed by me.

My understanding is that he is working on a patch for Tom's requested
parser changes, will post on other thread.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-04-05 Thread Simon Riggs

On 5 April 2018 at 07:01, Pavan Deolasee  wrote:

>> +/*
>> + * Given OID of the partition leaf, return the index of the leaf in the
>> + * partition hierarchy.
>> + */
>> +int
>> +ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid)
>> +{
>> +   int i;
>> +
>> +   for (i = 0; i < proute->num_partitions; i++)
>> +   {
>> +   if (proute->partition_oids[i] == partoid)
>> +   break;
>> +   }
>> +
>> +   Assert(i < proute->num_partitions);
>> +   return i;
>> +}
>>
>> Shouldn't we at least warn in a comment that this is O(N)? And document
>> that it does weird stuff if the OID isn't found?
>
>
> Yeah, added a comment. Also added a ereport(ERROR) if we do not find the
> partition. There was already an Assert, but may be ERROR is better.
>
>>
>>
>> Perhaps just introduce a PARTOID syscache?
>>
>
> Probably as a separate patch. Anything more than a handful partitions is
> anyways known to be too slow and I doubt this code will add anything
> material impact to that.

There's a few others trying to change that now, so I think we should
consider working on this now.

PARTOID syscache sounds like a good approach.

>> diff --git a/src/backend/executor/nodeMerge.c
>> b/src/backend/executor/nodeMerge.c
>> new file mode 100644
>> index 000..0e0d0795d4d
>> --- /dev/null
>> +++ b/src/backend/executor/nodeMerge.c
>> @@ -0,0 +1,575 @@
>>
>> +/*-
>> + *
>> + * nodeMerge.c
>> + *   routines to handle Merge nodes relating to the MERGE command
>>
>> Isn't this file misnamed and it should be execMerge.c?  The rest of the
>> node*.c files are for nodes that are invoked via execProcnode /
>> ExecProcNode().  This isn't an actual executor node.
>
>
> Makes sense. Done. (Now that the patch is committed, I don't know if there
> would be a rethink about changing file names. May be not, just raising that
> concern)

My review notes suggest a file called execMerge.c. I didn't spot the
filename change.

I think it's important to do that because there is no executor node
called Merge. That is especially confusing because there *is* an
executor node called MergeAppend and we want some cognitive distance
between those things.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: New files for MERGE

2018-04-04 Thread Simon Riggs

On 4 April 2018 at 21:14, Andres Freund  wrote:

>> The normal way is to make review comments that allow change. Your
>> request for change of the parser data structures is fine and can be
>> done, possibly by Saturday
>
> I did request changes, and you've so far ignored those requests.

Pavan tells me he has replied to you and is working on specific changes.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgsql: New files for MERGE

2018-04-04 Thread Simon Riggs

On 4 April 2018 at 20:09, Tom Lane  wrote:
> [ removing -committers from cc ]
>
> Pavan Deolasee  writes:
>> On Thu, Apr 5, 2018 at 12:16 AM, Andres Freund  wrote:
>>> Hows that an explanation for just going ahead and committing? Without
>>> even commenting on why one thinks the pointed out issues are something
>>> that can be resolved later or somesuch?  This has an incredibly rushed
>>> feel to it.
>
>> Anyways, I think your reviews comments are useful and I've incorporated
>> most of those. Obviously certain things like creating a complete new
>> executor machinery is not practical given where we're in the release cycle
>> and I am not sure if that has any significant advantages over what we have
>> today.
>
> Well, what's on the table is reverting this patch and asking you to try
> again in the v12 cycle.  Given Andres' concerns about the executor design,
> and mine about the way the parsing end is built, there's certainly no way
> that that's all getting fixed by Saturday.  Given pretty much everybody's
> unhappiness with the way this patch was forced through at the last minute,
> I do not think you should expect that we'll say, "okay, we'll let you ship
> a bad version of MERGE because there's no more time in this cycle".

This version works, with agreed semantics, all fully tested and documented.

It's also neat and tight. Look how easy it was for Peter to add WITH
semantics on top of it.

And it's isolated, so its not a threat to anybody that doesn't choose
to use it. Users want it and will use this; if I didn't know that for
certain I wouldn't spend time on it.

> Personally, I didn't think we had consensus on whether the semantics
> are right, let alone on whether this is a satisfactory implementation
> code-wise.  I know I've never looked at the patch before today; I did not
> think it was close enough to being committed that I would need to.

I have rejected patches in the past for clearly stated reasons that
affect users. I regret that those people didn't discuss their designs
before they posted patches and serious holes were found. And in
response I provided clear design outlines of what was needed. That is
not what is happening here.

The normal way is to make review comments that allow change. Your
request for change of the parser data structures is fine and can be
done, possibly by Saturday

If saying "I'm unhappy with something" is sufficient grounds for
rejecting a patch, I'm surprised to hear it. There has been no
discussion of what exactly would be better, only that what we have is
somehow wrong, a point which both Pavan and I dispute, not least
because the executor has already been rewritten once at Peter's
request.

I was under no pressure at all to commit this. In my opinion this is a
good version of MERGE and that is why I committed it. If it were not,
I would have no hesitation in pushing back a year or more, if I knew
of technical reasons to do so.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add support for printing/reading MergeAction nodes

2018-04-04 Thread Simon Riggs

On 4 April 2018 at 18:51, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 4 April 2018 at 17:19, Tom Lane  wrote:
>>> If the MERGE patch has broken this, I'm going to push back on that
>>> and push back on it hard, because it probably means there are a
>>> whole bunch of other raw-grammar-output-only node types that can
>>> now get past the parser stage as well, as children of these nodes.
>>> The answer to that is not to add a ton of new infrastructure, it's
>>> "you did MERGE wrong".
>
>> MERGE contains multiple actions of Insert, Update and Delete and these
>> could be output in various debug modes. I'm not clear what meaning we
>> might attach to them if we looked since that differs from normal
>> INSERTs, UPDATEs, DELETEs, but lets see.
>
> What I'm complaining about is that that's a very poorly designed parsetree
> representation.  It may not be the worst one I've ever seen, but it's
> got claims in that direction.  You're repurposing InsertStmt et al to
> do something that's *not* an INSERT statement, but by chance happens
> to share some (not all) of the same fields.  This is bad because it
> invites confusion, and then bugs of commission or omission (eg, assuming
> that some particular processing has happened or not happened to subtrees
> of that parse node).  The most egregious way in which it's a bad idea
> is that, as I said, InsertStmt doesn't even exist in post-parse-analysis
> trees so far as the normal type of INSERT is concerned.  This just opens
> a whole batch of ways to screw up.  We have some types of raw parse nodes
> that are replaced entirely during parse analysis, and we have some others
> where it's convenient to use the same node type before and after parse
> analysis, but we don't have any that are one way in one context and the
> other way in other contexts.  And this is not the place to start.
>
> I think you'd have been better off to fold all of those fields into
> MergeAction, or perhaps make several variants of MergeAction.

OK, that can be changed, will check and report back tomorrow.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add support for printing/reading MergeAction nodes

2018-04-04 Thread Simon Riggs

On 4 April 2018 at 17:19, Tom Lane  wrote:
> Marina Polyakova  writes:
>> When debugging is enabled for server logging, isolation tests fail
>> because there're no corresponding output functions for InsertStmt /
>> DeleteStmt / UpdateStmt that are used in the output of the MergeAction
>> nodes (see the attached regressions diffs and output). I also attached a
>> try that makes the tests pass. Sorry if I missed that it was already
>> discussed somewhere.
>
> Uh ... what?
>
> Those node types are supposed to appear in raw grammar output only;
> they should never survive past parse analysis.

So if I understand this correctly, it has nothing to do with the
isolation tester, that's just the place where the report was from.

Which debug mode are we talking about, please?

> If the MERGE patch has broken this, I'm going to push back on that
> and push back on it hard, because it probably means there are a
> whole bunch of other raw-grammar-output-only node types that can
> now get past the parser stage as well, as children of these nodes.
> The answer to that is not to add a ton of new infrastructure, it's
> "you did MERGE wrong".

MERGE hasn't broken anything. Marina is saying that the debug output
for MERGE isn't generated correctly.

I accept it shouldn't give spurious messages and I'm sure we can fix.

MERGE contains multiple actions of Insert, Update and Delete and these
could be output in various debug modes. I'm not clear what meaning we
might attach to them if we looked since that differs from normal
INSERTs, UPDATEs, DELETEs, but lets see.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add support for printing/reading MergeAction nodes

2018-04-04 Thread Simon Riggs

On 4 April 2018 at 17:19, Tom Lane  wrote:

> BTW, poking around in the grammar, I notice that MergeStmt did not
> get added to RuleActionStmt.  That seems like a rather serious
> omission.

MERGE isn't a privilege, a trigger action or a policy action. Why
would it be in RuleActionStmt?

Could you explain what command you think should be supported?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Logical decoding of TRUNCATE

2018-04-04 Thread Simon Riggs

On 4 April 2018 at 03:31, Peter Eisentraut
 wrote:

> I wonder why this approach was chosen.

I looked at coding it that way and it seemed worse than the way chosen.

> I'm going to try to hack up an alternative approach and see how it works
> out.

If you add a new filter for TRUNCATE it will mean compatibility
problems and the need for pg_dump support.

Need note in release notes to explain that people will need to add
TRUNCATE filter to their existing publications. I was hoping to have
that picked up automatically, which can't happen if we have an
explicit filter for it.

>> I know this has been discussed in the thread already, but it really
>> strikes me as wrong to basically do some mini DDL replication feature
>> via per-command WAL records.

Don't really understand that comment.

> I think TRUNCATE is special in some ways and it's reasonable to treat it
> specially.  Real DDL is being worked on in the 2PC decoding thread.
> What we are discussing here isn't going to be applicable there and vice
> versa, I think.  In fact, one of the reasons for this effort is that in
> BDR TRUNCATE is currently handled like a DDL command, which doesn't work
> well.

Agreed

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Logical decoding of TRUNCATE

2018-04-02 Thread Simon Riggs

On 2 April 2018 at 19:30, Peter Eisentraut
 wrote:

>>> ***
>>> *** 111,116  CREATE PUBLICATION >> class="parameter">name
>>> --- 111,121 
>>> and so the default value for this option is
>>> 'insert, update, delete'.
>>>
>>> +  
>>> +TRUNCATE is treated as a form of
>>> +DELETE for the purpose of deciding whether
>>> +to publish, or not.
>>> +  
>>>   
>>>  
>>> 
>>
>> Why is this a good idea?
>
> I think it seemed like a good idea at the time, so to speak, but several
> people have argued against it, so we should probably change this in the
> final version.

Who has argued against it?

I see that Andres has asked twice about it and been answered twice,
but expressed no opinion.
Petr has said he thinks that's right.
Nobody else has commented.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] logical decoding of two-phase transactions

2018-04-02 Thread Simon Riggs

On 29 March 2018 at 23:30, Petr Jelinek  wrote:
> On 29/03/18 23:58, Andres Freund wrote:
>> On 2018-03-29 23:52:18 +0200, Tomas Vondra wrote:
>>>> I have added details about this in src/backend/storage/lmgr/README as
>>>> suggested by you.
>>>>
>>>
>>> Thanks. I think the README is a good start, but I think we also need to
>>> improve the comments, which is usually more detailed than the README.
>>> For example, it's not quite acceptable that LogicalLockTransaction and
>>> LogicalUnlockTransaction have about no comments, especially when it's
>>> meant to be public API for decoding plugins.
>>
>> FWIW, for me that's ground to not accept the feature. Burdening output
>> plugins with this will make their development painful (because they'll
>> have to adapt regularly) and correctness doubful (there's nothing
>> checking for the lock being skipped).  Another way needs to be found.
>>
>
> I have to agree with Andres here. It's also visible in the latter
> patches. The pgoutput patch forgets to call these new APIs completely.
> The test_decoding calls them, but it does so even when it's processing
> changes for committed transaction.. I think that should be avoided as it
> means potentially doing SLRU lookup for every change. So doing it right
> is indeed not easy.

Yet you spotted these problems easily enough. Similar to finding
missing LWlocks.

> I as wondering how to hide this. Best idea I had so far would be to put
> it in heap_beginscan (and index_beginscan given that catalog scans use
> is as well) behind some condition. That would also improve performance
> because locking would not need to happen for syscache hits. The problem
> is however how to inform the heap_beginscan about the fact that we are
> in 2PC decoding. We definitely don't want to change all the scan apis
> for this. I wonder if we could add some kind of property to Snapshot
> which would indicate this fact - logical decoding is using it's own
> snapshots it could inject the information about being inside the 2PC
> decoding.

Perhaps, but how do we know we've covered all the right places? We
don't know what every plugin will require, do we?

The plugin needs to take responsibility for its own correctness,
whether we make it easier or not.

It seems clear that we would need a generalized API (the proposed
locking approach) to cover all requirements.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] logical decoding of two-phase transactions

2018-04-02 Thread Simon Riggs

On 29 March 2018 at 23:24, Andres Freund  wrote:

>> I agree with the former, of course - docs are a must. I disagree with
>> the latter, though - there have been about no proposals how to do it
>> without the locking. If there are, I'd like to hear about it.
>
> I don't care. Either another solution needs to be found, or the locking
> needs to be automatically performed when necessary.

That seems unreasonable.

It's certainly a nice future goal to have it all happen automatically,
but we don't know what the plugin will do.

How can we ever make an unknown task happen automatically? We can't.

We have a reasonable approach here. Locking shared resources before
using them is not a radical new approach, its just standard
development. If we find a better way in the future, we can use that,
but requiring a better solution when there isn't one is unreasonable.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Logical decoding of TRUNCATE

2018-04-01 Thread Simon Riggs

On 1 April 2018 at 21:01, Andres Freund  wrote:

>> ***
>> *** 111,116  CREATE PUBLICATION > class="parameter">name
>> --- 111,121 
>> and so the default value for this option is
>> 'insert, update, delete'.
>>
>> +  
>> +TRUNCATE is treated as a form of
>> +DELETE for the purpose of deciding whether
>> +to publish, or not.
>> +  
>>   
>>  
>> 
>
> Why is this a good idea?

TRUNCATE removes rows, just as DELETE does, so anybody that wants to
publish the removal of rows will be interested in this.

This avoids unnecessary overcomplication of the existing interface.

>> +  * Write a WAL record to allow this set of actions to be logically 
>> decoded.
>> +  * We could optimize this away when !RelationIsLogicallyLogged(rel)
>> +  * but that doesn't save much space or time.
>
> What you're saying isn't that you're not logging anything, but that
> you're allocating the header regardless? Because this certainly sounds
> like you unconditionally log a WAL record.

It says that, yes, my idea - as explained.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-04-01 Thread Simon Riggs

On 31 March 2018 at 14:21, Amit Kapila  wrote:
> On Thu, Mar 29, 2018 at 4:47 PM, Greg Stark  wrote:
>> I'm poking around to see debug a vacuuming problem and wondering if
>> I've found something more serious.
>>
>> As far as I can tell the snapshots on HOT standby are built using a
>> list of running xids that the primary builds and puts in the WAL and
>> that seems to include all xids from transactions running in all
>> databases. The HOT standby would then build a snapshot and eventually
>> send the xmin of that snapshot back to the primary in the hot standby
>> feedback and that would block vacuuming tuples that might be visible
>> to the standby.
>>
>> Many ages ago Alvaro sweated blood to ensure vacuums could run for
>> long periods of time without holding back the xmin horizon and
>> blocking other vacuums from cleaning up tuples. That's the purpose of
>> the excludeVacuum flag in GetCurrentVirtualXIDs(). That's possible
>> because we know vacuums won't insert any tuples that queries might try
>> to view and also vacuums won't try to perform any sql queries on other
>> tables.
>>
>> I can't find anywhere that the standby snapshot building mechanism
>> gets this same information about which xids are actually vacuums that
>> can be ignored when building a snapshot.
>>
>
> I think the vacuum assigns xids only if it needs to truncate some of
> the pages in the relation which happens towards the end of vacuum.
> So, it shouldn't hold back the xmin horizon for long.

Yes, that's the reason. I recall VACUUMs giving lots of problems
during development  of Hot Standby.

VACUUM FULL was the thing that needed to be excluded in the past
because it needed an xid to move rows.

Greg's concern is a good one and his noticing that we hadn't
specifically excluded VACUUMs is valid, so we should exclude them.
Well spotted, Greg.

So although this doesn't have the dramatic effect it might have had,
there is still the possibility of some effect and I think we should
treat it as a bug.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


ignore_lazy_vacuums_in_RunningTransactionData.v1.patch
Description: Binary data

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-30 Thread Simon Riggs

On 29 March 2018 at 10:50, Simon Riggs  wrote:
> On 28 March 2018 at 12:00, Pavan Deolasee  wrote:
>
>> v27 attached, though review changes are in
>> the add-on 0005 patch.
>
> This all looks good now, thanks for making all of those changes.
>
> I propose [v27 patch1+patch3+patch5] as the initial commit candidate
> for MERGE, with other patches following later before end CF.
>
> I propose to commit this tomorrow, 30 March, about 26 hours from now.
> That will allow some time for buildfarm fixing/reversion before the
> Easter weekend, then other patches to follow starting 2 April. That
> then gives reasonable time to follow up on other issues that we will
> no doubt discover fairly soon after commit, such as additional runs by
> SQLsmith and more eyeballs.

No problems found, but moving proposed commit to 2 April pm

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-30 Thread Simon Riggs

On 29 March 2018 at 23:16, Tomas Vondra  wrote:
>
>
> On 03/29/2018 11:18 PM, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> If each WAL record has xl_curr, then we know to which position the
>>> record belongs (after verifying the checksum). And we do know the size
>>> of each WAL record, so we should be able to deduce if two records are
>>> immediately after each other.
>>
>> Per my point earlier, XLOG_SWITCH is sufficient to defeat that argument.
>
> But the SWITCH record will be the last record in the WAL segment (and if
> there happens to be a WAL record after it, it will have invalid xl_curr
> pointer). And the next valid record will be the first one in the next
> WAL segment. So why wouldn't that be enough information?
>
>> Also consider a timeline fork.  It's really hard to be sure which record
>> in the old timeline is the direct ancestor of the first one in the new
>> if you lack xl_prev:
>>
>>   A1 -> B1 -> C1 -> D1
>>   \
>> B2 -> C2 -> D2
>>
>> If you happened to get confused and think that C2 is the first in its
>> timeline, diverging off the old line after B1 not A1, there would be
>> nothing about C2 to disabuse you of your error.
>
> Doesn't that mean the B1 and B2 have to be exactly the same size?
> Otherwise there would be a gap between B1/C2 or B2/C1, and the xl_curr
> would be enough to detect this.
>
> And how could xl_prev detect it? AFAIK XLogRecPtr does not include
> TimeLineID, so xl_prev would be the same for both B1 and B2.
>
> I admit WAL internals are not an are I'm particularly familiar with,
> though, so I may be missing something utterly obvious.

Timeline history files know the LSN at which they fork. I can imagine
losing the timeline branch point metadata because of corruption or
other disaster, but getting the LSN slightly wrong sounds strange. If
it was slightly wrong and yet still a valid LSN, xl_prev provides no
protection about that.

The timeline branch point is already marked by specific records in
WAL, so if you forgot the LSN completely you would search for those
and then you'd know. xl_prev wouldn't assist you in that search.

Agree with points about XLOG_SWITCH, adding again the observation that
they are no longer used for replication and if they are used as well
as replication, have a bad effect on performance. I think it would be
easily possible to add some more detail to the WAL stream if needed.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-29 Thread Simon Riggs

On 29 March 2018 at 21:03, Tomas Vondra  wrote:
>
> On 03/29/2018 08:02 PM, Robert Haas wrote:
>> On Thu, Mar 29, 2018 at 1:13 PM, Tomas Vondra
>>  wrote:
>>> On 03/29/2018 06:42 PM, Tom Lane wrote:
>>>> The long and the short of it is that this is a very dangerous-looking
>>>> proposal, we are at the tail end of a development cycle, and there are
>>>> ~100 other patches remaining in the commitfest that also have claims
>>>> on our attention in the short time that's left.  If you're expecting
>>>> people to spend more time thinking about this now, I feel you're being
>>>> unreasonable.
>>>
>>> I agree.
>>
>> +1.
>>
>>>> Also, I will say it once more: this change DOES decrease robustness.
>>>> It's like blockchain without the chain aspect, or git commits without
>>>> a parent pointer.  We are not only interested in whether individual
>>>> WAL records are valid, but whether they form a consistent series.
>>>> Cross-checking xl_prev provides some measure of confidence about that;
>>>> xl_curr offers none.
>>>
>>> Not sure.
>>>
>>> If each WAL record has xl_curr, then we know to which position the
>>> record belongs (after verifying the checksum). And we do know the size
>>> of each WAL record, so we should be able to deduce if two records are
>>> immediately after each other. Which I think is enough to rebuild the
>>> chain of WAL records.
>>>
>>> To defeat this, this would need to happen:
>>>
>>> a) the WAL record gets written to a different location
>>> b) the xl_curr gets corrupted in sync with (a)
>>> c) the WAL checksum gets corrupted in sync with (b)
>>> d) the record overwrites existing record (same size/boundaries)
>>>
>>> That seems very much like xl_prev.
>>
>> I don't think so, because this ignores, for example, timeline
>> switches, or multiple clusters accidentally sharing an archive
>> directory.
>>
>
> I'm curious? How does xl_prev prevents either of those issues (in a way
> that xl_curr would not)?
>
>> TBH, I'm not entirely sure how likely corruption would be under this
>> proposal in practice, but I think Tom's got the right idea on a
>> theoretical level.   As he says, there's a reason why things like
>> block chains and git commits include something about the previous
>> record in what gets hashed to create the identifier for the new
>> record.  It ties those things together in a way that doesn't happen
>> otherwise.  If somebody proposed changing git so that the SHA
>> identifier for a commit didn't hash the commit ID for the previous
>> commit, that would break the security of the system in all kinds of
>> ways: for example, an adversary could maliciously edit an old commit
>> by just changing its SHA identifier and that of its immediate
>> descendent.  No other commit IDs would change and integrity checks
>> would not fail -- there would be no easy way to notice that something
>> bad had happened.
>>
>
> That seems like a rather poor analogy, because we're not protected
> against such adversarial modifications with xl_prev either. I can damage
> the previous WAL record, update its checksum and you won't notice anything.
>
> My understanding is that xl_curr and xl_prev give us pretty much the
> same amount of information, except that xl_prev provides it explicitly
> while with xl_curr it's implicit and we need to deduce it.
>
>> Now, in our case, the chances of problems are clearly a lot more
>> remote because of the way that WAL records are packed into files.
>> Every WAL record has a place where it's supposed to appear in the WAL
>> stream, and positions in the WAL stream are never intentionally
>> recycled (except in PITR scenarios, I guess).  Because of that, the
>> proposed xl_curr check provides a pretty strong cross-check on the
>> validity of a record even without xl_prev.  I think there is an
>> argument to be made - as Simon is doing - that this check is in fact
>> strong enough that it's good enough for government work.  He might be
>
> Is he? I think the claims in this thread were pretty much that xl_curr
> and xl_prev provide the same level of protection.

Yes, the blockchain analogy breaks down because we don't include
previous CRC, only xl_prev. The existing WAL format is easy enough to
repair, falsify etc. You've still got to keep your WAL data secure.

So the image of a CSI-style chain-of-evidence thing happening isn't
technical reality.

You could change the existing format to include the previous CRC in
the calculation of the current record's CRC. I'd be happy with that as
an option, just as much as I'd support an option for high performance
as I've proposed here. But bear in mind that including the CRC would
mean that you couldn't repair data damaged by data corruption because
it would mean all WAL after the corrupt page would be effectively
scrambled.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-29 Thread Simon Riggs

On 29 March 2018 at 19:02, Robert Haas  wrote:
>
>>> Also, I will say it once more: this change DOES decrease robustness.
>>> It's like blockchain without the chain aspect, or git commits without
>>> a parent pointer.  We are not only interested in whether individual
>>> WAL records are valid, but whether they form a consistent series.
>>> Cross-checking xl_prev provides some measure of confidence about that;
>>> xl_curr offers none.
>>
>> Not sure.
>>
>> If each WAL record has xl_curr, then we know to which position the
>> record belongs (after verifying the checksum). And we do know the size
>> of each WAL record, so we should be able to deduce if two records are
>> immediately after each other. Which I think is enough to rebuild the
>> chain of WAL records.
>>
>> To defeat this, this would need to happen:
>>
>> a) the WAL record gets written to a different location
>> b) the xl_curr gets corrupted in sync with (a)
>> c) the WAL checksum gets corrupted in sync with (b)
>> d) the record overwrites existing record (same size/boundaries)
>>
>> That seems very much like xl_prev.
>
> I don't think so, because this ignores, for example, timeline
> switches, or multiple clusters accidentally sharing an archive
> directory.

I'm not hearing any actual technical problems.

> Given where we are
> in the release cycle, I'd prefer to defer that question to next cycle.

Accepted, on the basis of risk.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-29 Thread Simon Riggs

On 29 March 2018 at 18:13, Tomas Vondra  wrote:
> On 03/29/2018 06:42 PM, Tom Lane wrote:
>> Simon Riggs  writes:
>>> I know the approach is new and surprising but I thought about it a lot
>>> before proposing it and I couldn't see any holes; still can't. Please
>>> give this some thought so we can get comfortable with this idea and
>>> increase performance as a result. Thanks.
>>
>> The long and the short of it is that this is a very dangerous-looking
>> proposal, we are at the tail end of a development cycle, and there are
>> ~100 other patches remaining in the commitfest that also have claims
>> on our attention in the short time that's left.  If you're expecting
>> people to spend more time thinking about this now, I feel you're being
>> unreasonable.
>>
>
> I agree.

We require consensus on such things for good reason. Risk is an issue
and everyone must be comfortable.

If its a good idea now it will still be a good idea in the next cycle.


>> Also, I will say it once more: this change DOES decrease robustness.
>> It's like blockchain without the chain aspect, or git commits without
>> a parent pointer.  We are not only interested in whether individual
>> WAL records are valid, but whether they form a consistent series.
>> Cross-checking xl_prev provides some measure of confidence about that;
>> xl_curr offers none.
>>
>
> Not sure.
>
> If each WAL record has xl_curr, then we know to which position the
> record belongs (after verifying the checksum). And we do know the size
> of each WAL record, so we should be able to deduce if two records are
> immediately after each other. Which I think is enough to rebuild the
> chain of WAL records.
>
> To defeat this, this would need to happen:
>
> a) the WAL record gets written to a different location
> b) the xl_curr gets corrupted in sync with (a)
> c) the WAL checksum gets corrupted in sync with (b)
> d) the record overwrites existing record (same size/boundaries)
>
> That seems very much like xl_prev.

That's how I see it. But I also see that it feels different to what we
had before and that takes time to either come up with a concrete,
quantifiable reason not to, or accept that it is OK.


So given my observation that no technical objection remains, I'm happy
to move this to next CF to give us more time and less risk.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-29 Thread Simon Riggs

On 29 March 2018 at 01:50, Robert Haas  wrote:
> On Tue, Mar 27, 2018 at 11:53 PM, Pavan Deolasee
>  wrote:
>> Yeah, we should not do that. The patch surely does not intend to replay any
>> more WAL than what we do today. The idea is to just use a different
>> mechanism to find the prior checkpoint. But we should surely find the latest
>> prior checkpoint. In the rare scenario that Tom showed, we should just throw
>> an error and fix the patch if it's not doing that already.
>
> It's not clear to me that there is any reasonable fix short of giving
> up on this approach altogether.  But I might be missing something.

(Sorry to come back to this late)

Yes, a few things have been been missed in this discussion.

Michael's point is invalid. The earlier discussion said that *replay*
of WAL earlier than the last checkpoint could lead to corruption, but
this doesn't mean the records themselves are in some way damaged
structurally, only that their content can lead to inconsistency in the
database. The xl_curr proposal does not replay WAL records it only
searches thru them to locate the last checkpoint.

Tom also said "Well, the point of checkpoints is that WAL data before
the last one should no longer matter anymore, isn't it?". WAL Files
prior to the one containing the last checkpoint can be discarded, but
checkpoints themselves do nothing to invalidate the WAL records before
the checkpoint.

Tom's point that a corruption in the WAL file could prevent pg_rewind
from locating a record is valid, but misses something. If the WAL file
contained corruption AFTER the last checkpoint this would already in
the current code prevent pg_rewind from moving backwards to find the
last checkpoint. So whether you go forwards or backwards it is
possible to be blocked by corrupt WAL records.

Most checkpoints are far enough apart that there is only one
checkpoint record in any WAL file and it could be anywhere in the
file, so the expected number of records to be searched by forwards or
backwards scans is equal. And the probability that corruption occurs
before or after the record is also equal.

Given the above points, I don't see any reason to believe that the
forwards search mechanism proposed is faulty, decreases robustness or
is even slower.

Overall, the loss of links between WAL records is not a loss of
robustness. XLOG_SWITCH is a costly operation and mostly unusable
these days, so shouldn't be a reason to shy away from this patch. We
don't jump around in WAL in any other way, so sequential records are
just fine.

So IMHO the proposed approach is still very much viable.

I know the approach is new and surprising but I thought about it a lot
before proposing it and I couldn't see any holes; still can't. Please
give this some thought so we can get comfortable with this idea and
increase performance as a result. Thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-29 Thread Simon Riggs

On 28 March 2018 at 12:00, Pavan Deolasee  wrote:

> v27 attached, though review changes are in
> the add-on 0005 patch.

This all looks good now, thanks for making all of those changes.

I propose [v27 patch1+patch3+patch5] as the initial commit candidate
for MERGE, with other patches following later before end CF.

I propose to commit this tomorrow, 30 March, about 26 hours from now.
That will allow some time for buildfarm fixing/reversion before the
Easter weekend, then other patches to follow starting 2 April. That
then gives reasonable time to follow up on other issues that we will
no doubt discover fairly soon after commit, such as additional runs by
SQLsmith and more eyeballs.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] A design for amcheck heapam verification

2018-03-29 Thread Simon Riggs

On 29 March 2018 at 01:49, Peter Geoghegan  wrote:

>>> IndexBuildHeapRangeScan() doesn't mention anything about CIC's heap
>>> ShareUpdateExclusiveLock (it just says SharedLock), because that lock
>>> strength doesn't have anything to do with IndexBuildHeapRangeScan()
>>> when it operates with an MVCC snapshot. I think that this means that
>>> this patch doesn't need to update comments within
>>> IndexBuildHeapRangeScan(). Perhaps that's a good idea, but it seems
>>> independent.
>>
>>
>> Ok, I agree. But note that we are now invoking that code with
>> AccessShareLock() whereas the existing callers either use ShareLock or
>> ShareUpdateExclusiveLock. That's probably does not matter, but it's a change
>> worth noting.
>
> Fair point, even though the ShareUpdateExclusiveLock case isn't
> actually acknowledged by IndexBuildHeapRangeScan(). Can we leave this
> one up to the committer, too? I find it very hard to argue either for
> or against this, and I want to avoid "analysis paralysis" at this
> important time.

The above discussion doesn't make sense to me, hope someone will explain.

I understand we are adding a check to verify heap against index and
this will take longer than before. When it runs does it report
progress of the run via pg_stat_activity, so we can monitor how long
it will take?

Locking is also an important concern.

If we need a ShareLock to run the extended check and the check runs
for a long time, when would we decide to run that? This sounds like it
will cause a production outage, so what are the pre-conditions that
would lead us to say "we'd better run this". For example, if running
this is known to be signficantly faster than running CREATE INDEX,
that might be an argument for someone to run this first if index
corruption is suspected.

If it detects an issue, can it fix the issue for the index by
injecting correct entries? If not then we will have to run CREATE
INDEX afterwards anyway, which makes it more likely that people would
just run CREATE INDEX and not bother with the check.

So my initial questions are about when we would run this and making
sure that is documented.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Commit fest 2017-11

2018-03-29 Thread Simon Riggs

On 29 March 2018 at 09:19, Magnus Hagander  wrote:
>
>
> On Thu, Mar 29, 2018 at 10:06 AM, Alexander Korotkov
>  wrote:
>>
>> Hi, Fabien!
>>
>> On Fri, Dec 1, 2017 at 9:10 AM, Fabien COELHO  wrote:
>>>>
>>>> And the last 21 patches have been classified as well. Here is the
>>>> final score for this time:
>>>> Committed: 55.
>>>> Moved to next CF: 103.
>>>> Rejected: 1.
>>>> Returned with Feedback: 47.
>>>> Total: 206.
>>>>
>>>> Thanks to all the contributors for this session! The CF is now closed.
>>>
>>>
>>> Thanks for the CF management.
>>>
>>> Attached a small graph of the end status of patch at the end of each CF.
>>
>>
>> Thank you for the graph!
>> It would be interesting to see statistics not by patches count, but by
>> their complexity.
>> For rough measure of complexity we can use number of affected lines.  I
>> expect that
>> statistics would be even more distressing: small patches can be committed
>> faster,
>> while large patches are traversing from one CF to another during long
>> time.  Interesting
>> to check whether it's true...
>>
>
> I think that's very hard to do given that we simply don't have the data
> today. It's not that simple to analyze the patches in the archives, because
> some are single file, some are spread across multiple etc. I fear that
> anything trying to work off that would actually make the stats even more
> inaccurate. This is the pattern I've seen whenever I've treid tha tbefore.
>
> I wonder if we should consider adding a field to the CF app *specifically*
> to track things like this. What I'm thinking is a field that's set (or at
> least verified) by the person who flags a patch as committed with choices
> like Trivial/Simple/Medium/Complex (trivial being things like typo fixes
> etc, which today can hugely skew the stats).
>
> Would people who update the CF app be willing to put in that effort (however
> small it is, it has to be done consistently to be of any value) in order to
> be able to track such statistics?
>
> It would only help for the future of course, unless somebody wants to go
> back and backfill existing patches with such information (which they might
> be).

The focus of this is on the Committers, which seems wrong.

I suggest someone does another analysis that shows how many patch
reviews have been conducted by patch authors, so we can highlight
people who are causing the problem yet not helping solve the problem.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-28 Thread Simon Riggs

On 29 March 2018 at 07:37, Pavan Deolasee  wrote:
>
>
> On Tue, Mar 27, 2018 at 5:00 PM, Simon Riggs  wrote:
>>
>>
>> In terms of further performance optimization, if there is just one
>> WHEN AND condition and no unconditional WHEN clauses then we can add
>> the WHEN AND easily to the join query.
>>
>> That seems like an easy thing to do for PG11
>>
>
> I think we need to be careful in terms of what can be pushed down to the
> join, in presence of WHEN NOT MATCHED actions. If we push the WHEN AND qual
> to the join then I am worried that some rows which should have been reported
> "matched" and later filtered out as part of the WHEN quals, will get
> reported as "not-matched", thus triggering WHEN NOT MATCHED action.

> postgres=# EXPLAIN ANALYZE MERGE INTO target t USING source s ON t.a = s.a
> WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b WHEN NOT MATCHED THEN
> INSERT VALUES (s.a, -1);

That has an unconditional WHEN clause, so would block the push down
using my stated rule above.

With something like this

MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b;

or this

MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b
WHEN NOT MATCHED DO NOTHING;

or this

MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED AND t.a < 2 THEN UPDATE SET b = s.b
WHEN MATCHED DO NOTHING
WHEN NOT MATCHED DO NOTHING;

then we can push down "t.a < 2" into the WHERE clause of the join query.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] logical decoding of two-phase transactions

2018-03-28 Thread Simon Riggs

On 28 March 2018 at 16:28, Nikhil Sontakke  wrote:

> Simon, 0003-Add-GID-and-replica-origin-to-two-phase-commit-abort.patch
> is the exact patch that you had posted for an earlier commit.

0003 Pushed

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Surjective functional indexes

2018-03-27 Thread Simon Riggs

On 23 March 2018 at 15:54, Simon Riggs  wrote:

> So please could you make the change?

Committed, but I still think that change would be good.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-27 Thread Simon Riggs

On 27 March 2018 at 14:58, Tom Lane  wrote:

> The point of the xl_prev links is, essentially, to allow verification
> that we are reading a coherent series of WAL records, ie that record B
> which appears to follow record A actually is supposed to follow it.
> This throws away that cross-check just as effectively as the other patch
> did, leaving us reliant solely on the per-record CRCs.  CRCs are good,
> but they do have a false-positive rate.

You aren't reliant solely on the CRC.

If you have a false-positive CRC then xl_curr allows you to validate
the record, just as you would with xl_prev. The xl_curr value points
to itself using just as many bits as the xl_prev field, so the
probability of false validation is the same as now.

xl_prev does allow you to verify that the WAL records form an unbroken
chain. The question is whether that has value because it certainly has
a very clear cost in terms of write performance. Is there some danger
that we would skip records? Where does that risk come from? If someone
*wanted* to skip records for whatever reason, they could achieve that
with both xl_prev or xl_curr, so there is no difference there.

> An example of a case where xl_prev makes one feel a lot more comfortable
> is the WAL-segment-switch option.  The fact that the first record of the
> next WAL file links back to the XLOG_SWITCH record is evidence that
> ignoring multiple megabytes of WAL was indeed the intended thing to do.
> An xl_curr field cannot provide any evidence as to whether WAL records
> were missed.

That's a good point, though very frequently unused since streaming replication.

We can handle XLOG_SWITCH as a special case. Perhaps we can make each
file header point back to the previous WAL record.

>> 2. Does the new logic in pg_rewind to search backward for a checkpoint
>> work reliably, and will it be slow?
>
> If you have to search backwards, this breaks it.  Full stop.

You don't have to search backwards. We only need to locate the last
checkpoint record.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-27 Thread Simon Riggs

On 27 March 2018 at 11:46, Simon Riggs  wrote:
> On 27 March 2018 at 10:31, Pavan Deolasee  wrote:
>
>> Fixed in v26.
>
> More comments on v26

In terms of further performance optimization, if there is just one
WHEN AND condition and no unconditional WHEN clauses then we can add
the WHEN AND easily to the join query.

That seems like an easy thing to do for PG11

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-27 Thread Simon Riggs

On 27 March 2018 at 10:31, Pavan Deolasee  wrote:

> Fixed in v26.

More comments on v26

* Change errmsg “Ensure that not more than one source rows match any
one target row”
should be
“Ensure that not more than one source row matches any one target row”

* I think we need an example in the docs showing a constant source
query, so people can understand how to use MERGE for OLTP as well as
large ELT

* Long comment in ExecMerge() needs rewording, formatting and spell check
I suggest not referring to an "order" since that concept doesn't exist
anywhere else

* Need tests for coverage of these ERROR messages
Named security policy violation
SELECT not allowed in MERGE INSERT...
Multiple VALUES clauses not...
MERGE is not supported for this...
MERGE is not supported for relations with inheritance
MERGE is not supported for relations with rules

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-27 Thread Simon Riggs

On 27 March 2018 at 10:28, Pavan Deolasee  wrote:

>> 1. In ExecMergeMatched() we have a line of code that does this...
>>
>> if (TransactionIdIsCurrentTransactionId(hufd.xmax))
>> then errcode(ERRCODE_CARDINALITY_VIOLATION)
>>
>> I notice this is correct, but too strong. It should be possible to run
>> a sequence of MERGE commands inside a transaction, repeatedly updating
>> the same set of rows, as is possible with UPDATE.
>>
>> We need to check whether the xid is the current subxid and the cid is
>> the current commandid, rather than using
>> TransactionIdIsCurrentTransactionId()
>
> AFAICS this is fine because we invoke that code only when
> HeapTupleSatisfiesUpdate returns HeapTupleSelfUpdated i.e. for the case when
> the tuple is updated by our transaction after the scan is started.
> HeapTupleSatisfiesUpdate already checks for command id before returning
> HeapTupleSelfUpdated.

Cool.

>> 5. I take it the special code for TransformMerge target relations is
>> replaced by "right_rte"? Seems fragile to leave it like that. Can we
>> add an Assert()? Do we care?
>
> I didn't get this point. Can you please explain?

The code confused me at that point. More docs pls.

>> 6. I didn't understand "Assume that the top-level join RTE is at the
>> end. The source relation
>> + * is just before that."
>> What is there is no source relation?
>
>
> Can that happen? I mean shouldn't there  always be a source relation? It
> could be a subquery or a function scan or just a plain relation, but
> something, right?

Yes, OK. So ordering is target, source, join.


>> 10. Comment needs updating for changes in code below it - "In MERGE
>> when and condition, no system column is allowed"
>>
>
> Yeah, that's kinda half-true since the code below supports TABLEOID and OID
> system columns. I am thinking about this in a larger context though. Peter
> has expressed desire to support system columns in WHEN targetlist and quals.
> I gave it a try and it seems if we remove that error block, all system
> columns are supported readily. But only from the target side. There is a
> problem if we try to refer a system column from the source side since the
> mergrSourceTargetList only includes user columns and so set_plan_refs()
> complains about a system column.
>
> I am not sure what's the best way to handle this. May be we can add system
> columns to the mergrSourceTargetList. I haven't yet found a neat way to do
> that.

I was saying the comment needs changing, not the code.

Cool, thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] logical decoding of two-phase transactions

2018-03-27 Thread Simon Riggs

On 23 March 2018 at 15:26, Simon Riggs  wrote:

> Reviewing 0003-Add-support-for-logging-GID-in-commit-abort-WAL-reco
>
> Looks fine, reworked patch attached
> * added changes to xact.h from patch 4 so that this is a whole,
> committable patch
> * added comments to make abort and commit structs look same
>
> Attached patch is proposed for a separate, early commit as part of this

Looking to commit "logging GID" patch today, if no further objections.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-27 Thread Simon Riggs

On 26 March 2018 at 17:06, Simon Riggs  wrote:
> On 26 March 2018 at 15:39, Pavan Deolasee  wrote:
>

> That's all I can see so far.

* change comment “once to” to “once” in src/include/nodes/execnodes.h
* change comment “and to run” to “and once to run”
* change “result relation” to “target relation”

* XXX we probably need to check plan output for CMD_MERGE also

* Spurious line feed in src/backend/optimizer/prep/preptlist.c

* No need to remove whitespace in src/backend/optimizer/util/relnode.c

* README should note that the TABLEOID junk column is not strictly
needed when joining to a non-partitioned table but we don't try to
optimize that away. Is that an XXX to fix in future or do we just
think the extra 4 bytes won't make much difference so we leave it?

* Comment in rewriteTargetListMerge() should mention TABLEOID exists
to allow us to find the correct relation, not the correct row, comment
just copied from CTID above it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-27 Thread Simon Riggs

On 26 March 2018 at 23:10, Peter Geoghegan  wrote:
> On Mon, Mar 26, 2018 at 12:17 PM, Simon Riggs  wrote:
>>> As far as I
>>> know, the proposed MERGE patch has that issue an existing DML commands
>>> don't; but someone else may have better information.
>>
>> I will look deeper and report back.
>
> It's quite clear that the problem exists with the MERGE patch

Accepted, the only question is whether it affects UPDATE as well cos
it looks like it should.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Simon Riggs

On 26 March 2018 at 17:52, Robert Haas  wrote:
> On Mon, Mar 26, 2018 at 12:16 PM, Simon Riggs  wrote:
>> On 26 March 2018 at 16:09, Robert Haas  wrote:
>>> On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs  wrote:
>>>> Since we now have MVCC catalog scans, all the name lookups are
>>>> performed using the same snapshot so in the above scenario the newly
>>>> created object would be invisible to the second name lookup.
>>>
>>> That's not true, because each lookup would be performed using a new
>>> snapshot -- not all under one snapshot.
>>
>> You're saying we take a separate snapshot for each table we lookup?
>> Sounds weird to me.
>
> I'm saying we take a separate snapshot for each and every catalog
> lookup, except when we know that no catalog changes can have occurred.
> See the commit message for 568d4138c646cd7cd8a837ac244ef2caf27c6bb8.
> If you do a lookup in pg_class and 3 lookups in pg_attribute each of
> the 4 can be done under a different snapshot, in the worst case.
> You're not the first person to believe that the MVCC catalog scan
> patch fixes that problem, but as the guy who wrote it, it definitely
> doesn't.  What that patch fixed was, prior to that patch, a catalog
> scan might find the WRONG NUMBER OF ROWS, like you might do a lookup
> against a unique index for an object that existed and, if the row was
> concurrently updated, you might find 0 rows or 2 rows instead of 1
> row.  IOW, it guaranteed that we used a consistent snapshot for each
> individual lookup, not a consistent snapshot for the whole course of a
> command.

That all makes sense, thanks for explaining.

I spent a few more minutes, going "but", "but" though I can now see
good reasons for everything to work this way.

>> So this error could happen in SELECT, UPDATE, DELETE or INSERT as well.
>>
>> Or you see this as something related specifically to MERGE, if so how?
>> Please explain what you see.
>
> As I said before, the problem occurs if the same command looks up the
> same table name in more than one place.  There is absolutely nothing
> to guarantee that we get the same answer every time.

> As far as I
> know, the proposed MERGE patch has that issue an existing DML commands
> don't; but someone else may have better information.

I will look deeper and report back.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Simon Riggs

On 26 March 2018 at 16:09, Robert Haas  wrote:
> On Mon, Mar 26, 2018 at 5:53 AM, Simon Riggs  wrote:
>> Since we now have MVCC catalog scans, all the name lookups are
>> performed using the same snapshot so in the above scenario the newly
>> created object would be invisible to the second name lookup.
>
> That's not true, because each lookup would be performed using a new
> snapshot -- not all under one snapshot.

You're saying we take a separate snapshot for each table we lookup?
Sounds weird to me.

So this error could happen in SELECT, UPDATE, DELETE or INSERT as well.

Or you see this as something related specifically to MERGE, if so how?
Please explain what you see.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Simon Riggs

On 26 March 2018 at 15:39, Pavan Deolasee  wrote:

> reviewer

1. In ExecMergeMatched() we have a line of code that does this...

if (TransactionIdIsCurrentTransactionId(hufd.xmax))
then errcode(ERRCODE_CARDINALITY_VIOLATION)

I notice this is correct, but too strong. It should be possible to run
a sequence of MERGE commands inside a transaction, repeatedly updating
the same set of rows, as is possible with UPDATE.

We need to check whether the xid is the current subxid and the cid is
the current commandid, rather than using
TransactionIdIsCurrentTransactionId()

On further analysis, I note that ON CONFLICT suffers from this problem
as well, looks like I just refactored it from there.

2. EXPLAIN ANALYZE looks unchanged from some time back. The math is
only correct as long as there are zero rows that do not cause an
INS/UPD/DEL.
We don't test for that. I think this is a regression from an earlier
report of the same bug, or perhaps I didn't fix it at the time.

3. sp. depedning trigger.sgml

4. trigger.sgml replace "specific actions specified" with "events
specified in actions"
to avoid the double use of "specific"

5. I take it the special code for TransformMerge target relations is
replaced by "right_rte"? Seems fragile to leave it like that. Can we
add an Assert()? Do we care?

6. I didn't understand "Assume that the top-level join RTE is at the
end. The source relation
+ * is just before that."
What is there is no source relation?

7. The formatting of the SQL statement in transformMergeStmt that
begins "Construct a query of the form" is borked, so the SQL layout is
unclear, just needs pretty print

8. I didn't document why I thought this was true "XXX if we have a
constant subquery, we can also skip join", but one of the explain
analyze outputs shows this is already true - where we provide a
constant query and it skips the join. So perhaps we can remove the
comment. (Search for "Seq Scan on target t_1")

9. I think we need to mention that MERGE won't work with rules or
inheritance (only partitioning)  in the main doc page. The current
text says that rules are ignored, which would be true if we didn't
specifically throw ERROR feature not supported.

10. Comment needs updating for changes in code below it - "In MERGE
when and condition, no system column is allowed"

11. In comment "Since the plan re-evaluated by EvalPlanQual uses the
second RTE", suggest using "join RTE" to make it more explicit which
RTE we are discussing

12. Missed out merge.sgml from v25 patch.

13. For triggers we say "No separate triggers are defined for
MERGE"
we should also state the same caveat for POLICY events

That's all I can see so far.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Simon Riggs

On 26 March 2018 at 15:39, Pavan Deolasee  wrote:

> Now that ON CONFLICT patch is in, here are rebased patches. The second patch
> is to add support for CTE (thanks Peter).
>
> Apart from rebase, the following things are fixed/improved:
>
> - Added test cases for column level privileges as suggested by Peter. One
> problem got discovered during the process. Since we expand and track source
> relation targetlist, the exiting code was demanding SELECT privileges on all
> attributes, even though MERGE is only referencing a few attributes on which
> the user has privilege. Fixed that by disassociating expansion from the
> actual referencing.

Good catch Peter.

> - Added a test case for RLS where SELECT policy actually hides some rows, as
> suggested by Stephen in the past
>
> - Added check to compare result relation's and merge target relation's OIDs,
> as suggested by Robert. Simon thinks it's not necessary given that we now
> scan catalog using MVCC snapshot. So will leave it to his discretion when he
> takes it up for commit

No problem with adding the test, its quick to compare two Oids.

> - Improved explanation regarding why we need a second RTE for merge target
> relation and general cleanup/improvements in that area



> I think it will be a good idea to send any further patches as add-on patches
> for reviewer/committer's sake. I will do that unless someone disagrees.

+1

So v25 is the "commit candidate" and we can add other patches to it.

Given recent bugfix/changes I don't plan to commit this tomorrow anymore.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-26 Thread Simon Riggs

On 24 March 2018 at 12:27, Robert Haas  wrote:
> On Sat, Mar 24, 2018 at 8:16 AM, Robert Haas  wrote:
>> On Thu, Mar 22, 2018 at 7:13 PM, Peter Geoghegan  wrote:
>>> While I think this this particular HINT buglet is pretty harmless, I
>>> continue to be concerned about the unintended consequences of having
>>> multiple RTEs for MERGE's target table. Each RTE comes from a
>>> different lookup path -- the first one goes through setTargetTable()'s
>>> parserOpenTable() + addRangeTableEntryForRelation() calls. The second
>>> one goes through transformFromClauseItem(), for the join, which
>>> ultimately ends up calling transformTableEntry()/addRangeTableEntry().
>>> Consider commit 5f173040, which fixed a privilege escalation security
>>> bug around multiple name lookup. Could the approach taken by MERGE
>>> here introduce a similar security issue?
>>
>> Yeah, that seems really bad.  I don't think there is a huge problem
>> with having multiple RTEs; for example, we very commonly end up with
>> both rte->inh and !rte->inh RTEs for the same table, and that is OK.
>> However, generating those RTEs by doing multiple name lookups for the
>> same table is a big problem.  Imagine, for example, that a user has a
>> search_path of a, b and that there is a table b.foo.  The user does a
>> merge on foo.  Between the first name lookup and the second, somebody
>> creates a.foo.  Now, potentially, half of the MERGE statement is going
>> to be running against b.foo and the other half against a.foo.  I don't
>> know whether that will crash or bomb out with a strange error or just
>> make some unexpected modification to one of those tables, but the
>> behavior, even if not insecure, will certainly be wrong.
>
> If it's possible to identify the two OIDs that are supposed to match
> and cross-check that the OIDs are the same, then we could just bomb
> out with an error if they aren't.

Since we now have MVCC catalog scans, all the name lookups are
performed using the same snapshot so in the above scenario the newly
created object would be invisible to the second name lookup.

So I don't see anyway for the ERROR to occur and hence no need for a
cross check, for UPDATE or MERGE.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-24 Thread Simon Riggs

On 24 March 2018 at 12:19, Robert Haas  wrote:
> On Sat, Mar 24, 2018 at 8:11 AM, Simon Riggs  wrote:
>> On 24 March 2018 at 11:58, Robert Haas  wrote:
>>> On Sat, Mar 24, 2018 at 7:42 AM, Simon Riggs  wrote:
>>>> I suggest we focus on the engineering. I've not discussed this patch
>>>> with Pavan offline.
>>>
>>> Well then proposing to commit moments after it's been posted is
>>> ridiculous.  That's exactly the opposite of "focusing on the
>>> engineering".
>>>
>>> This is a patch about which multiple people have expressed concerns.
>>> You're trying to jam a heavily redesigned version in at the last
>>> minute without adequate review.  Please don't do that.
>>
>> All presumption on your part. Don't tell me what I think and then diss
>> me for that.
>
> It is of course true that I can't know your internal mental state with
> certainty, but I don't think it is presumptuous to suppose that a
> person who says they're going to commit a patch that hasn't been
> reviewed intends to commit a patch that hasn't been reviewed.

It seems you really are more intent on talking about my actions than
on discussing the patch.

Probably would have been better to read what I've actually said beforehand.

If you have anything further to say, make it a formal complaint
offlist so we can discuss engineering here.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-24 Thread Simon Riggs

On 24 March 2018 at 12:16, Robert Haas  wrote:
> On Thu, Mar 22, 2018 at 7:13 PM, Peter Geoghegan  wrote:
>> While I think this this particular HINT buglet is pretty harmless, I
>> continue to be concerned about the unintended consequences of having
>> multiple RTEs for MERGE's target table. Each RTE comes from a
>> different lookup path -- the first one goes through setTargetTable()'s
>> parserOpenTable() + addRangeTableEntryForRelation() calls. The second
>> one goes through transformFromClauseItem(), for the join, which
>> ultimately ends up calling transformTableEntry()/addRangeTableEntry().
>> Consider commit 5f173040, which fixed a privilege escalation security
>> bug around multiple name lookup. Could the approach taken by MERGE
>> here introduce a similar security issue?
>
> Yeah, that seems really bad.  I don't think there is a huge problem
> with having multiple RTEs; for example, we very commonly end up with
> both rte->inh and !rte->inh RTEs for the same table, and that is OK.
> However, generating those RTEs by doing multiple name lookups for the
> same table is a big problem.  Imagine, for example, that a user has a
> search_path of a, b and that there is a table b.foo.  The user does a
> merge on foo.  Between the first name lookup and the second, somebody
> creates a.foo.  Now, potentially, half of the MERGE statement is going
> to be running against b.foo and the other half against a.foo.  I don't
> know whether that will crash or bomb out with a strange error or just
> make some unexpected modification to one of those tables, but the
> behavior, even if not insecure, will certainly be wrong.

MERGE uses multiple RTEs in exactly the same way UPDATE does.

I don't see a reason for specific concern wrt to MERGE.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-24 Thread Simon Riggs

On 24 March 2018 at 11:58, Robert Haas  wrote:
> On Sat, Mar 24, 2018 at 7:42 AM, Simon Riggs  wrote:
>> I suggest we focus on the engineering. I've not discussed this patch
>> with Pavan offline.
>
> Well then proposing to commit moments after it's been posted is
> ridiculous.  That's exactly the opposite of "focusing on the
> engineering".
>
> This is a patch about which multiple people have expressed concerns.
> You're trying to jam a heavily redesigned version in at the last
> minute without adequate review.  Please don't do that.

All presumption on your part. Don't tell me what I think and then diss
me for that.

If you have engineering comments, please make them.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-24 Thread Simon Riggs

I suggest we focus on the engineering. I've not discussed this patch
with Pavan offline.

On 23 March 2018 at 23:32, Michael Paquier  wrote:
> On Fri, Mar 23, 2018 at 11:06:48AM +0000, Simon Riggs wrote:
>> Your assumption that I would commit a new patch that was 29 mins old
>> is frankly pretty ridiculous, so yes, lets keep calm.
>
> When a committer says that a patch is "ready for commit" and that he
> calls for "last objections", I am understanding that you would be ready
> to commit the patch from the moment such an email has been sent.  Am I
> the only one to think so?  Now let's look at the numbers:
> - The last patch sent is a v2, which implements a completely new
> approach compared to v1.  This is a non-trivial patch which touches
> sensitive parts of the code.
> - v2 has been sent exactly two weeks after the last email exchanged on
> this thread.
> - Per the data publicly available, it took less than 30 minutes to
> review the patch, and there are zero comments about its contents.
> I do patch review on a more-or-less daily basis, and look at threads on
> hackers on a daily basis, but I really rarely see such an "efficient"
> review pattern.  You and Pavan have likely discussed the patch offline,
> but nobody can guess what has been discussed and what have been the
> arguments exchanged.
>
>> Enjoy your weekend and I'll be happy to read your review on Monday.
>
> Er.  So this basically means that I need to do a commitment to look at
> this patch in such a short time frame?  If you are asking for reviews,
> doing such requests by asking a proper question rather than by implying
> it in an affirmation would seem more adapted to me, so this email bit is
> making me uncomfortable.  My apologies if I am not able to catch the
> nuance in those words.
> --
> Michael



-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Surjective functional indexes

2018-03-23 Thread Simon Riggs

On 23 March 2018 at 15:39, Konstantin Knizhnik
 wrote:
>
>
> On 22.03.2018 23:37, Alvaro Herrera wrote:
>>
>> The rd_projidx (list of each nth element in the index list that is a
>> projection index) thing looks odd.  Wouldn't it make more sense to have
>> a list of index OIDs that are projection indexes?
>>
>
> rd_projidx is not a list, it is Bitmapset. It is just one of many bitmap
> sets in RelationData:
>
> Bitmapset  *rd_indexattr;/* identifies columns used in indexes */
> Bitmapset  *rd_keyattr;/* cols that can be ref'd by foreign keys
> */
> Bitmapset  *rd_pkattr;/* cols included in primary key */
> Bitmapset  *rd_idattr;/* included in replica identity index */
>
> Yes, it is possible to construct Oid list of projectional indexes instead of
> having bitmap which marks some indexes in projectional, but I am not sure
> that
> it will be more efficient both from CPU and memory footprint point of view
> (in most indexes bitmap will consists of just one integer). In any case, I
> do not think that it can have some measurable impact on performance or
> memory size:
> number of indexes and especially projectional indexes will very rarely
> exceed 10.

Alvaro's comment seems reasonable to me.

The patch adds both rd_projindexattr and rd_projidx
rd_projindexattr should be a Bitmapset, but rd_projidx looks strange
now he mentions it.

Above that in RelationData we have other structures that are List of
OIDs, so Alvaro's proposal make sense.

That would simplify the code in ProjectionIsNotChanged() by just
looping over the list of projection indexes rather than the list of
indexes

So please could you make the change?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] logical decoding of two-phase transactions

2018-03-23 Thread Simon Riggs

On 5 March 2018 at 16:37, Nikhil Sontakke  wrote:

>>
>> I will re-submit with "git format-patch" soon.
>>
> PFA, patches in "format-patch" format.
>
> This patch set also includes changes in the test_decoding plugin along
> with an additional savepoint related test case that was pointed out on
> this thread, upstream.

Reviewing 0001-Cleaning-up-and-addition-of-new-flags-in-ReorderBuff.patch

Change from is_known_as_subxact to rbtxn_is_subxact
loses some meaning, since rbtxn entries with this flag set false might
still be subxacts, we just don't know yet.

rbtxn_is_serialized refers to RBTXN_SERIALIZED
so flag name should be RBTXN_IS_SERIALIZED so it matches

Otherwise looks OK to commit


Reviewing 0003-Add-support-for-logging-GID-in-commit-abort-WAL-reco

Looks fine, reworked patch attached
* added changes to xact.h from patch 4 so that this is a whole,
committable patch
* added comments to make abort and commit structs look same

Attached patch is proposed for a separate, early commit as part of this

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


logging-GID-in-commit-abort-WAL.v2.patch
Description: Binary data

Re: [HACKERS] MERGE SQL Statement for PG11

2018-03-23 Thread Simon Riggs

On 23 March 2018 at 11:26, Pavan Deolasee  wrote:
>
>
> On Fri, Mar 23, 2018 at 6:45 AM, Peter Geoghegan  wrote:
>>
>> On Thu, Mar 22, 2018 at 6:02 PM, Alvaro Herrera 
>> wrote:
>> > Incremental development is a good thing.  Trying to do everything in a
>> > single commit is great when time is infinite or even merely very long,
>> > but if you run out of it, which I'm sure is common, leaving some things
>> > out that can be reasonable implemented in a separate patch is perfectly
>> > acceptable.
>>
>> We're talking about something that took me less than an hour to get
>> working. AFAICT, it's just a matter of tweaking the grammar, and
>> adding a bit of transformWithClause() boilerplate to the start of
>> transformMergeStmt().
>>>
>>>
>>> I quickly implemented CTE support myself (not wCTE support, since
>>> MERGE doesn't use RETURNING), and it wasn't tricky. It seems to work
>>> when I mechanically duplicate the approach taken with other types of
>>> DML statement in the parser. I have written a few tests, and so far it
>>> holds up.

Peter, if you have the code and you consider it important that this
subfeature is in PostgreSQL, why not post the code so we can commit
it?

Why would we repeat what has already been done?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Simon Riggs

On 23 March 2018 at 09:22, Michael Paquier  wrote:
> On Fri, Mar 23, 2018 at 09:04:55AM +0000, Simon Riggs wrote:
>> So it shows clear benefit for both bulk actions and OLTP, with no 
>> regressions.
>>
>> No objection exists to the approach used in the patch, so I'm now
>> ready to commit this.
>>
>> Last call for objections?
>
> Please hold on.  It is Friday evening here.  First I cannot take the
> time to articulate an answer you are requesting on the other thread
> part.  Second, the latest version of the patch has been sent from Pavan
> a couple of hours ago, and you are proposing to commit it in a message
> sent 29 minutes after the last version has been sent.
>
> Let's cool down a bit and take some time for reviews, please.  You still
> have one week until the end of the commit fest anyway.

Your assumption that I would commit a new patch that was 29 mins old
is frankly pretty ridiculous, so yes, lets keep calm.

Enjoy your weekend and I'll be happy to read your review on Monday.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Simon Riggs

On 23 March 2018 at 08:35, Pavan Deolasee  wrote:
>
>
> On Fri, Mar 9, 2018 at 8:49 PM, Peter Eisentraut
>  wrote:
>>
>> On 2/1/18 19:21, Simon Riggs wrote:
>> > If we really can't persuade you of that, it doesn't sink the patch. We
>> > can have the WAL pointer itself - it wouldn't save space but it would
>> > at least alleviate the spinlock.
>>
>> Do you want to send in an alternative patch that preserves the WAL
>> pointer and only changes the locking?
>>
>
> Sorry for the delay. Here is an updated patch which now replaces xl_prev
> with xl_curr, thus providing similar safeguards against corrupted or torn
> WAL pages, but still providing benefits of atomic operations.
>
> I repeated the same set of tests and the results are almost similar. These
> tests are done on a different AWS instance though and hence not comparable
> to previous tests. What we do in these tests is essentially call
> ReserveXLogInsertLocation() 1M times to reserve 256 bytes each time, in a
> single select statement (the function is exported and a SQL-callable routine
> is added for the purpose of the tests)
>
> Patched:
> ==
> #clients  #tps
> 1 34.195311
> 2 29.001584
> 4 31.712009
> 8 35.489272
> 16 41.950044
>
> Master:
> ==
> #clients  #tps
> 1 24.128004
> 2 12.326135
> 4 8.334143
> 8 16.035699
> 16 8.502794
>
> So that's pretty good improvement across the spectrum.

So it shows clear benefit for both bulk actions and OLTP, with no regressions.

No objection exists to the approach used in the patch, so I'm now
ready to commit this.

Last call for objections?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

2018-03-23 Thread Simon Riggs

On 2 February 2018 at 02:17, Michael Paquier  wrote:
> On Fri, Feb 02, 2018 at 12:21:49AM +0000, Simon Riggs wrote:
>> Yes, it would be about 99% of the time.
>
> When it comes to recovery, I don't think that 99% is a guarantee
> sufficient.  (Wondering about the maths behind such a number as well.)
>
>> But you have it backwards - we are not assuming that case. That is the
>> only case that has risk - the one where an old WAL record starts at
>> exactly the place the latest one stops. Otherwise the rest of the WAL
>> record will certainly fail the CRC check, since it will effectively
>> have random data in it, as you say.
>
> Your patch assumes that a single WAL segment recycling is fine to
> escape based on the file name, but you need to think beyond that.  It
> seems to me that your assumption is wrong if the tail of a segment gets
> reused after more cycles than a single one, which could happen when
> doing recovery from an archive, where segments used could have junk in
> them.  So you actually *increase* the odds of problems if a segment is
> forcibly switched and archived, then reused in recovery after being
> fetched from an archive.

This seems to be a pivotal point in your argument, yet it is just an assertion.

Please explain for the archive why you think the odds increase in the
way you describe.

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Surjective functional indexes

2018-03-22 Thread Simon Riggs

Please excuse my absence from this thread.

On 2 March 2018 at 12:34, Konstantin Knizhnik  wrote:
>
>
> On 01.03.2018 22:48, Andres Freund wrote:
>>
>> Hi,
>>
>> I still don't think, as commented upon by Tom and me upthread, that we
>> want this feature in the current form.

The performance benefit of this patch is compelling. I don't see a
comment from Tom along those lines, just a queston as to whether the
code is in the right place.

>> Your arguments about layering violations seem to be counteracted by the
>> fact that ProjectionIsNotChanged() basically appears to do purely
>> executor work inside inside heapam.c.
>
>
> ProjectionIsNotChanged doesn't perform evaluation itslef, is calls
> FormIndexDatum.
> FormIndexDatum is also called in 13 other places in Postgres:
> analyze.c, constraint.c, index.c, tuplesort, execIndexing.c
>
> I still think that trying to store somewhere result odf index expression
> evaluation to be able to reuse in index update is not so good idea:
> it complicates code and add extra dependencies between different modules.
> And benefits of such optimization are not obvious: is index expression
> evaluation is so expensive, then recheck_on_update should be prohibited for
> such index at all.

Agreed. What we are doing is trying to avoid HOT updates.

We make the check for HOT update dynamically when it could be made
statically in some cases. Simplicity says just test it dynamically.

We could also do work to check for HOT update for functional indexes
earlier but for the same reason, I don't think its worth adding.

The patch itself is about as simple as it can be, so the code is in
the right place. My problems with it have been around the actual user
interface to request it.

Index option handling has changed (and this needs rebase!), but other
than that I think we want this and am planning to commit something
early next week.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

< 1 2 3 4 5 6 7 8 >

501 - 600 of 716 matches

Mail list logo