Re: [HACKERS] CTE inlining

2017-04-29 Thread Craig Ringer
On 30 Apr. 2017 13:28, "Andres Freund"  wrote:

On 2017-04-30 00:28:46 -0400, Tom Lane wrote:
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences.  Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.

If all referenced functions are non-volatile, I don't quite see the
problem?  Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?


Exactly.

The initial implementation had limitations. So they got documented as
features, not bugs or possible future enhancements. Yay? So we're stuck
with it forever?

I agree we shouldn't break working, correct queries such that they return
different results. But if someone is lying about volatility they don't get
the expectation of correctness. And we have a policy against hints, so
surely we should be keen to remove this hack that serves as a hint - right?

We have OFFSET 0 for anyone really depending on it, and at least when you
read that you know to go "wtf" and look at the manual, wheras the CTE fence
behaviour is invisible and silent.

Yes, experienced and established postgres users expect the optimisation
fence behaviour. They abuse it as a query hint or work around it with
subqueries in FROM. They also know OFFSET 0 ... and ideally should even
read the relnotes. Users from other DMBSes looking to migrate, and new
users, are regularly surprised by our CTEs. I see it a lot on Stack
Overflow and other places outside our comfortable walls.

Personally I find it very annoying when I'd like to use CTEs to structure
queries more readably, but land up having to use subqueries in FROM instead.

Like the work Andes has been doing on our bizarre handing of SRFs in the
SELECT target list I really think it's just something that needs to be done.


Re: [HACKERS] CTE inlining

2017-04-29 Thread Andres Freund
On 2017-04-30 00:28:46 -0400, Tom Lane wrote:
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences.  Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.

If all referenced functions are non-volatile, I don't quite see the
problem?  Personally I believe we'll have to offer a proper
anti-inlining workaround anyway, and in that case there's really nothing
that should stop us from inlining CTE without volatile functions twice?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-04-29 Thread Pavel Stehule
2017-04-30 6:28 GMT+02:00 Tom Lane :

> Craig Ringer  writes:
> > - as you noted, it is hard to decide when it's worth inlining vs
> > materializing for CTE terms referenced more than once.
>
> [ raised eyebrow... ]  Please explain why the answer isn't trivially
> "never".
>
> There's already a pretty large hill to climb here in the way of
> breaking peoples' expectations about CTEs being optimization
> fences.  Breaking the documented semantics about CTEs being
> single-evaluation seems to me to be an absolute non-starter.
>
>
why we cannot to introduce GUC option - enable_cteoptfence ?

Regards

Pavel



> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] CTE inlining

2017-04-29 Thread Tom Lane
Craig Ringer  writes:
> - as you noted, it is hard to decide when it's worth inlining vs
> materializing for CTE terms referenced more than once.

[ raised eyebrow... ]  Please explain why the answer isn't trivially
"never".

There's already a pretty large hill to climb here in the way of
breaking peoples' expectations about CTEs being optimization
fences.  Breaking the documented semantics about CTEs being
single-evaluation seems to me to be an absolute non-starter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-04-29 Thread Craig Ringer
On 30 Apr. 2017 07:56, "Ilya Shkuratov"  wrote:

Hello, dear hackers!

There is task in todo list about optional CTE optimization fence
disabling.

I am not interested at this point in disabling mechanism
implementation, but I would like to discuss the optimization
mechanism, that should work when the fence is disabled.


It's looking at what other DBMSes do.

Notably MS SQL Server. AFAIK its CTEs are a lot like query-scoped views.
They are simply updatable where possible, so you can write

WITH x AS (...)
UPDATE x SET ...

I do not know how MS SQL handles inlining and pullup/pushdown vs
materialization, handles multiple evaluation costs, etc.

This is the model I would want to aim for.



It seems, that we can replace CTE with subquery, so the optimizer
can do all available optimizations. This idea is quite
straightforward, but I could not find a discussion of it.
(Maybe it is so, because everyone knows that the idea is bad and it is
not worth to discuss. But I hope it is not, so I start this thread. =))


It's not bad for SELECT.

But there are complexities.

- CTE terms may contain data-mutating functions people are relying on not
multiply executing;

- we document that in postgres CTEs act as optimisation fences even with
the standard syntax. So users rely on this as a query hint. Personally I
want to relnotes this and tell people to use our OFFSET 0 hint instead, or
add a NOINLINE option to our CTEs, then make pg allow inlining by default.
This is a BC break, but not a big one if we restrict inlining of volatile.
And since we don't have query hints (*cough*) by project policy, we can't
really object to removing one can we?

- as you noted, it is hard to decide when it's worth inlining vs
materializing for CTE terms referenced more than once. We should possibly
start small and only inline single reference terms in the first release.
We'd continue to force materializing of multiple reference terms.

That'd at least help people who use CTEs to write clearer queries not
suffer for it. And it'd give us experience to help with conservatively
introducing multiple reference inlining.


Re: [HACKERS] A misconception about the meaning of 'volatile' in GetNewTransactionId?

2017-04-29 Thread Tom Lane
Thomas Munro  writes:
> I was reading xact.c and noticed this block:
> ...
> Isn't this insufficient on non-TSO systems like POWER and Arm?

Yeah, I think you're right.  That code probably predates our support
for memory barriers, so "volatile" was the best we could do at the
time --- but as you say, it doesn't fix hardware-level rearrangements.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A misconception about the meaning of 'volatile' in GetNewTransactionId?

2017-04-29 Thread Thomas Munro
Hi hackers,

I was reading xact.c and noticed this block:

/*
 * Use volatile pointer to prevent code rearrangement;
other backends
 * could be examining my subxids info concurrently,
and we don't want
 * them to see an invalid intermediate state, such as
incrementing
 * nxids before filling the array entry.  Note we are
assuming that
 * TransactionId and int fetch/store are atomic.
 */
volatile PGPROC *myproc = MyProc;
volatile PGXACT *mypgxact = MyPgXact;

if (!isSubXact)
mypgxact->xid = xid;
else
{
int nxids = mypgxact->nxids;

if (nxids < PGPROC_MAX_CACHED_SUBXIDS)
{
myproc->subxids.xids[nxids] = xid;
mypgxact->nxids = nxids + 1;

Isn't this insufficient on non-TSO systems like POWER and Arm?  It
uses volatile qualifiers as a compiler barrier, which is probably
enough for x86 and Sparc in TSO mode, but doesn't include a memory
barrier to prevent hardware reordering.

I think the thing to do here would be to forget about volatile, stick
pg_write_barrier() between those two writes, and stick
pg_read_barrier() between the reads in any code that might read nxids
and then scan xids concurrently, such as TransactionIdIsInProgress().

This is almost exactly the example from the section "Avoiding Memory
Order Bugs" in src/backend/storage/lmgr/README.barrier.

Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-29 Thread Tom Lane
David Rowley  writes:
> On 29 April 2017 at 15:39, Tom Lane  wrote:
>> I'm kind of strongly tempted to apply the second patch; but it would
>> be fair to complain that reduce_unique_semijoins() is new development
>> and should wait for v11.  Opinions?

> My vote is for the non-minimal patch. Of course, I'd be voting for
> minimal patch if this was for a minor version release fix, but we're
> not even in beta yet for v10. The original patch was intended to fix
> cases like this, although I'd failed to realise this particular case.

Yeah, I thought we'd discussed doing something more or less like this
way back in that thread.

After studying the patch some more, I noticed that reduce_unique_semijoins
falsifies the assumption in innerrel_is_unique that we only probe inner
uniqueness for steadily larger relid sets.  If the semijoin LHS is more
than one relation, then it'll test inner uniqueness using that LHS, and if
the proof fails, that's knowledge that can save individual proof attempts
for the individual LHS rels later on during the join search.  So in the
attached, I've modified reduce_unique_semijoins's API a bit more to allow
the caller to override the don't-cache heuristic.

Also, this form of the patch is an incremental patch over the minimal
fix I posted yesterday.  It seems like a good idea to push it as a
separate commit, if only for future bisection purposes.

If I don't hear objections, I'll push this tomorrow sometime.

regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 39e2ddd..c130d2f 100644
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
*** add_paths_to_joinrel(PlannerInfo *root,
*** 126,138 
  	 *
  	 * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
  	 * matter since the executor can make the equivalent optimization anyway;
! 	 * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, if
! 	 * the LHS covers all of the associated semijoin's min_lefthand, then it's
! 	 * appropriate to set inner_unique because the path produced by
! 	 * create_unique_path will be unique relative to the LHS.  (If we have an
! 	 * LHS that's only part of the min_lefthand, that is *not* true.)  For
! 	 * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape
! 	 * this module.
  	 */
  	switch (jointype)
  	{
--- 126,140 
  	 *
  	 * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't
  	 * matter since the executor can make the equivalent optimization anyway;
! 	 * we need not expend planner cycles on proofs.  For JOIN_UNIQUE_INNER, we
! 	 * must be considering a semijoin whose inner side is not provably unique
! 	 * (else reduce_unique_semijoins would've simplified it), so there's no
! 	 * point in calling innerrel_is_unique.  However, if the LHS covers all of
! 	 * the semijoin's min_lefthand, then it's appropriate to set inner_unique
! 	 * because the path produced by create_unique_path will be unique relative
! 	 * to the LHS.  (If we have an LHS that's only part of the min_lefthand,
! 	 * that is *not* true.)  For JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid
! 	 * letting that value escape this module.
  	 */
  	switch (jointype)
  	{
*** add_paths_to_joinrel(PlannerInfo *root,
*** 145,156 
  			   outerrel->relids);
  			break;
  		case JOIN_UNIQUE_OUTER:
! 			extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
! 	JOIN_INNER, restrictlist);
  			break;
  		default:
! 			extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel,
! 	jointype, restrictlist);
  			break;
  	}
  
--- 147,166 
  			   outerrel->relids);
  			break;
  		case JOIN_UNIQUE_OUTER:
! 			extra.inner_unique = innerrel_is_unique(root,
! 	outerrel->relids,
! 	innerrel,
! 	JOIN_INNER,
! 	restrictlist,
! 	false);
  			break;
  		default:
! 			extra.inner_unique = innerrel_is_unique(root,
! 	outerrel->relids,
! 	innerrel,
! 	jointype,
! 	restrictlist,
! 	false);
  			break;
  	}
  
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 69b9be4..34317fe 100644
*** a/src/backend/optimizer/plan/analyzejoins.c
--- b/src/backend/optimizer/plan/analyzejoins.c
*** static bool rel_is_distinct_for(PlannerI
*** 42,48 
  	List *clause_list);
  static Oid	distinct_col_search(int colno, List *colnos, List *opids);
  static bool is_innerrel_unique_for(PlannerInfo *root,
! 	   RelOptInfo *outerrel,
  	   RelOptInfo *innerrel,
  	   JoinType jointype,
  	   List *restrictlist);
--- 42,48 
  	List *clause_list);
  static Oid	distinct_col_search(int colno, List *colnos, List *opids);
  static bool is_innerrel_unique_for(PlannerInfo *root,
! 	   Relids ou

Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-29 Thread David Rowley
On 29 April 2017 at 15:39, Tom Lane  wrote:
> I'm kind of strongly tempted to apply the second patch; but it would
> be fair to complain that reduce_unique_semijoins() is new development
> and should wait for v11.  Opinions?

My vote is for the non-minimal patch. Of course, I'd be voting for
minimal patch if this was for a minor version release fix, but we're
not even in beta yet for v10. The original patch was intended to fix
cases like this, although I'd failed to realise this particular case.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CTE inlining

2017-04-29 Thread Ilya Shkuratov
Hello, dear hackers!

There is task in todo list about optional CTE optimization fence 
disabling.

I am not interested at this point in disabling mechanism 
implementation, but I would like to discuss the optimization 
mechanism, that should work when the fence is disabled.


It seems, that we can replace CTE with subquery, so the optimizer 
can do all available optimizations. This idea is quite 
straightforward, but I could not find a discussion of it. 
(Maybe it is so, because everyone knows that the idea is bad and it is 
not worth to discuss. But I hope it is not, so I start this thread. =))

First of all, to such replacement to be valid, the CTE must be 
1. non-writable (e.g. be of form: SELECT ...),
2. do not use VOLATILE or STABLE functions,
3. ... (maybe there must be more restrictions?) 

Also, before inlining, we should check that some optimization 
can be applied, using functions from 
'pull_up_subqueries_recurse' and 'subquery_push_qual'.

If it is true, and there only one reference to CTE, 
we can inline it immediately.


What it is not clear is how we should estimate whether it is worth 
to inline, when there is multiple references. Here are my preliminary
ideas.


Let consider "pull up subquery" and "push down qualifiers" cases 
separately.

For "push down qualifiers", if `subquery_push_qual` is `true`, 
we can do the following: 
1. copy CTE subquery,
2. push down quals,
3. find paths,
3. inline if cost of 
(CTE scan) > (cheapest_path(subquery) + subquery scan) 

Probably, this approach is not feasible, because it involves subquery 
replaning, and we should consider a more "lightweight" heuristic.

For "pull up subquery" similar approach may lead to duplicate planning 
of the whole query, that almost sure is too expensive.
So I wonder, is it possible to estimate a join predicate selectivity 
against CTE subquery result and inline it if selectivity is "high" enough?
(If it is possible the same can be applied to the first case.)


I would be glad to hear feedback on described approach.

Ilya Shkuratov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapbuild woes

2017-04-29 Thread Noah Misch
On Fri, Apr 21, 2017 at 10:34:58PM -0700, Andres Freund wrote:
> I've a bunch of tests, but I don't quite know whether we can expose all
> of them via classical tests.  There are several easy ones that I
> definitely want to add (import "empty" snapshot; import snapshot with
> running xacts; create snapshot, perform some ddl, import snapshot,
> perform some ddl, check things work reasonably crazy), but there's
> enough others that are just probabilistic.  I was wondering about adding
> a loop that simply runs for like 30s and then quits or such, but who
> knows.

If the probabilistic test catches the bug even 5% of the time in typical
configurations, the buildfarm will rapidly identify any regression.  I'd
choose a 7s test that detects the bug 5% of the time over a 30s test that
detects it 99% of the time.  (When I wrote src/bin/pgbench/t/001_pgbench.pl
for a probabilistic bug, I sized that test to finish in 1s and catch its bug
half the time.  In its case, only two buildfarm members were able to
demonstrate the original bug, so 5% detection would have been too low.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Time based lag tracking for logical replication

2017-04-29 Thread Petr Jelinek
On 23/04/17 01:10, Petr Jelinek wrote:
> Hi,
> 
> The time based lag tracking commit [1] added interface for logging
> progress of replication so that we can report lag as time interval
> instead of just bytes. But the patch didn't contain patch for the
> builtin logical replication.
> 
> So I wrote something that implements this. I didn't like all that much
> the API layering in terms of exporting the walsender's LagTrackerWrite()
> for use by plugin directly. Normally output plugin does not have to care
> if it's running under walsender or not, it uses abstracted write
> interface for that which can be implemented in various ways (that's how
> we implement SQL interface to logical decoding after all). So I decided
> to add another function to the logical decoding write api called
> update_progress and call that one from the output plugin. The walsender
> then implements that new API to call the LagTrackerWrite() while the SQL
> interface just does not implement it at all. This seems like cleaner way
> of doing it.
> 

The original no longer applies after Andres committed some of my fixes
for snapshot builder so here is rebased version.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Add-support-for-time-based-lag-tracking-to-logical-170429.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: logical replication and PANIC during shutdown checkpoint in publisher

2017-04-29 Thread Noah Misch
On Tue, Apr 25, 2017 at 03:26:06PM -0400, Peter Eisentraut wrote:
> On 4/20/17 11:30, Peter Eisentraut wrote:
> > On 4/19/17 23:04, Noah Misch wrote:
> >> This PostgreSQL 10 open item is past due for your status update.  Kindly 
> >> send
> >> a status update within 24 hours, and include a date for your subsequent 
> >> status
> >> update.  Refer to the policy on open item ownership:
> >> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> > 
> > We have a possible solution but need to work out a patch.  Let's say
> > next check-in on Monday.
> 
> Update: We have a patch that looks promising, but we haven't made much
> progress in reviewing the details.  I'll work on it this week, and
> perhaps Michael also has time to work on it this week.  We could use
> more eyes.  Next check-in Friday.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] convert EXSITS to inner join gotcha and bug

2017-04-29 Thread Teodor Sigaev

Really, the way to fix Teodor's complaint is to recognize that the
semijoin inner rel is effectively unique against the whole outer rel,
and then strength-reduce the semijoin to a plain join.  The infrastructure
we built for unique joins is capable of proving that, we just weren't
applying it in the right way.
Perfect, it works. Thank you! Second patch reduces time of full query (my 
example was just a small part) from 20 minutes to 20 seconds.



I'm kind of strongly tempted to apply the second patch; but it would
be fair to complain that reduce_unique_semijoins() is new development
and should wait for v11.  Opinions?


Obviously, I'm voting for second patch applied to version 10.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-29 Thread Tom Lane
Dmitriy Sarafannikov  writes:
>> Maybe we need another type of snapshot that would accept any
>> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,

> If I understood correctly, this new type of snapshot would help if
> there are long running transactions which can see this tuples.
> But if there are not long running transactions, it will be the same.
> Am i right?

Right.  You haven't shown us much about the use-case you're concerned
with, so it's not clear what's actually needed.

> And what about don’t fetch actual min and max values from indexes
> whose columns doesn’t involved in join? 

We don't fetch that info unless we need it.

I'm not entirely certain, but there could be cases where a single
planning cycle ends up fetching that data more than once.  (There's
caching at the RestrictInfo level, but that might not be enough.)
So a line of thought that might be worth looking into is adding a
lower layer of caching to make sure it's not done more than once per
plan.  Again, whether this saves anything would depend a lot on
specific use-cases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range

2017-04-29 Thread Dmitriy Sarafannikov

> Maybe we need another type of snapshot that would accept any
> non-vacuumable tuple.  I really don't want SnapshotAny semantics here,
> but a tuple that was live more recently than the xmin horizon seems
> like it's acceptable enough.  HeapTupleSatisfiesVacuum already
> implements the right behavior, but we don't have a Snapshot-style
> interface for it.


If I understood correctly, this new type of snapshot would help if
there are long running transactions which can see this tuples.
But if there are not long running transactions, it will be the same.
Am i right?

And what about don’t fetch actual min and max values from indexes
whose columns doesn’t involved in join? 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers