date:20190128

Re: COPY FROM WHEN condition

2019-01-28 Thread David Rowley

On Wed, 23 Jan 2019 at 06:35, Tomas Vondra  wrote:
> It turned out to be a tad more complex due to partitioning, because when
> we find the partitions do not match, the tuple is already allocated in
> the "current" context (be it per-tuple or batch). So we can't just free
> the whole context at that point. The old code worked around this by
> alternating two contexts, but that seems a bit too cumbersome to me, so
> the patch simply copies the tuple to the new context. That allows us to
> reset the batch context always, right after emptying the buffer. I need
> to do some benchmarking to see if the extra copy causes any regression.

I agree that fixing the problem mentioned by separating out tuple and
batch contexts is a good idea, but I disagree with removing the
alternating batch context idea.  FWIW the reason the alternating
contexts went in wasn't just for fun, it was on performance grounds.
There's a lengthy discussion in [1] about not adding any new
performance regressions to COPY. To be more specific, Peter complains
about a regression of 5% in [2].

It's pretty disheartening to see the work there being partially undone.

Here are my performance tests of with and without your change to the
memory contexts (I missed where you posted your results).

$ cat bench.pl
for (my $i=0; $i < 8912891; $i++) {
print "1\n1\n2\n2\n";
}
36a1281f86c: (with your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 26825.142 ms (00:26.825)
Time: 27202.117 ms (00:27.202)
Time: 26266.705 ms (00:26.267)

4be058fe9ec: (before your change)

postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
COPY 35651564
Time: 25645.460 ms (00:25.645)
Time: 25698.193 ms (00:25.698)
Time: 25737.251 ms (00:25.737)

So looks like your change slows this code down 4% for this test case.
That's about twice as bad as the 2.2% regression that I had to solve
for the multi-insert partition patch (of which you've removed much of
the code from)

I'd really like to see the alternating batch context code being put
back in to fix this. Of course, if you have a better idea, then we can
look into that, but just removing code that was carefully written and
benchmarked without any new benchmarks to prove that it's okay to do
so seems wrong to me.

[1] 
https://www.postgresql.org/message-id/flat/CAKJS1f93DeHN%2B9RrD9jYn0iF_o89w2B%2BU8-Ao5V1kd8Cf7oSGQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/2b40468d-6be0-e795-c363-0015bc9fa747%402ndquadrant.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Rename nodes/relation.h => nodes/pathnodes.h ?

2019-01-28 Thread Alvaro Herrera

On 2019-Jan-28, Tom Lane wrote:

> (There was some mention of trying to split relation.h into multiple
> files, but I fail to see any advantage in that.)

Hmm, nodes/relation.h includes lots of other files and is widely
included.  If we can split it usefully, I vote for that.  However, I
failed to find any concrete proposal for doing that.  I don't have one
ATM but I'd like to keep the door open for it happening at some point.

I do like planner/pathnodes.h as a name, FWIW.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Delay locking partitions during query execution

2019-01-28 Thread Amit Langote

On 2019/01/28 20:27, David Rowley wrote:
> On Mon, 28 Jan 2019 at 20:45, Amit Langote
>  wrote:
>> It seems to me that plancache.c doesn't really need to perform
>> AcquireExecutorLocks()/LockRelationOid() to learn that a partition's
>> reloptions property has changed to discard a generic plan and build a new
>> one.  AFAICT, PlanCacheRelCallback() takes care of resetting a cached plan
>> by observing that an invalidation message that it received  either from
>> the same session or from another session belongs to one of the relations
>> in PlannedStmt.relationOids.  That list must already contain all
>> partitions' OIDs.
> 
> Really? So when you tried my case you properly got a plan with a
> non-parallel Seq Scan on listp1?
> 
> I imagine you didn't with yours since we check for relcache
> invalidations at the start of a transaction.  I performed both my
> EXECUTEs in the same transaction.

Yeah, I performed each EXECUTE in a new transaction, so not the same case
as yours.  Sorry about the noise.

However, I tried the example as you described and the plan *doesn't*
change due to concurrent update of reloptions with master (without the
patch) either.

session 1:
begin;
prepare q1 as select count(*) from listp;
explain (costs off, analyze, timing off, summary off) execute q1;
   QUERY PLAN

─
 Finalize Aggregate (actual rows=1 loops=1)
   ->  Gather (actual rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate (actual rows=1 loops=3)
   ->  Parallel Append (actual rows=7 loops=3)
 ->  Parallel Seq Scan on listp1 (actual rows=3
loops=3)
 ->  Parallel Seq Scan on listp2 (actual rows=5
loops=2)
(8 rows)

session 2:
alter table listp1 set (parallel_workers=0);

session 1:
explain (costs off, analyze, timing off, summary off) execute q1;
   QUERY PLAN

─
 Finalize Aggregate (actual rows=1 loops=1)
   ->  Gather (actual rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate (actual rows=1 loops=3)
   ->  Parallel Append (actual rows=7 loops=3)
 ->  Parallel Seq Scan on listp1 (actual rows=3
loops=3)
 ->  Parallel Seq Scan on listp2 (actual rows=5
loops=2)
(8 rows)

I then built master with -DCATCACHE_FORCE_RELEASE and the plan does
change, but because of syscache misses here and there resulting in opening
the erring system catalog which then does AcceptInvalidationMessages().

In the normal build, invalidation messages don't seem to be processed even
by calling AcquireExecutorLocks(), which is perhaps not normal.  It seem
that the following condition in LockRelationOid is never true when called
from AcquireExecutorLocks:

if (res != LOCKACQUIRE_ALREADY_CLEAR)
{
AcceptInvalidationMessages();
MarkLockClear(locallock);
}

Bug, maybe?

Thanks,
Amit

Re: Covering GiST indexes

2019-01-28 Thread Andrey Borodin



> 29 янв. 2019 г., в 7:32, Andreas Karlsson  написал(а):
> 
> On 1/28/19 7:26 PM, Andrey Borodin wrote:
>>> * I am no fan of the tupdesc vs truncTupdesc separation and think that it 
>>> is a potential hazard, but I do not have any better suggestion right now.
>> B-tree is copying tupdesc every time they truncate tuple. We need tuple 
>> truncation a little more often: when we are doing page split, we have to 
>> form all page tuples, truncated.
>> Initially, I've optimized only this case, but this led to prepared tupledesc 
>> for truncated tuples.
>>> 
>>> * There is no test case for exclusion constraints, and I feel since that is 
>>> one of the more important uses we should probably have at least one such 
>>> test case.
>> Actually, I did not understand this test case. Can you elaborate more on 
>> this? How included attributes should participate in exclude index? What for?
> 
> I mean include a table like below among the tests. I feel like this is a main 
> use case for INCLUDE.
> 
> CREATE TABLE t2 (
>  x int4range,
>  y int,
> 
>  EXCLUDE USING gist (x WITH &&) INCLUDE (y)
> );

Thanks for the explanation. Added this as case 6 to index_including_gist.

>>> * Why the random noise in the diff below? I think using "(c3) INCLUDE (c4)" 
>>> for both gist and rtree results in a cleaner patch.
>> I've used columns with and without opclass in INCLUDE. This led to these 
>> seemingly random changes.
> 
> I mean the diff would be smaller as the below. It also may make sense to make 
> both lines "(c3) INCLUDE (c1, c4)".
> 
> CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
> CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
> CREATE INDEX on tbl USING gist(c3) INCLUDE (c4);
> CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
> CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
> CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
> -CREATE INDEX on tbl USING rtree(c1, c2) INCLUDE (c3, c4);
> +CREATE INDEX on tbl USING rtree(c3) INCLUDE (c4);
> CREATE INDEX on tbl USING btree(c1, c2) INCLUDE (c3, c4);
> 

I've took your version of this test and added all variations of included 
attributes.


PFA v7.

Thanks!

Best regards, Andrey Borodin.



0001-Covering-GiST-v7.patch
Description: Binary data

Re: Follow-up on INSERT INTO ... SET ...

2019-01-28 Thread Tom Lane

Sven Berkvens-Matthijsse  writes:
> In 2016, a thread was started about implementing INSERT INTO ... SET ... 
> that included a patch and was basically ready for inclusion in 
> PostgreSQL. However, it seems as though it stagnated for some reason. 
> Does anybody remember this and is there perhaps someone who knows what 
> the current status is? If nobody is working on this any longer, I'd be 
> willing to try to revive the patch for the current code base.
> The thread that I'm talking about can be found at: 
> https://www.postgresql.org/message-id/flat/709e06c0-59c9-ccec-d216-21e38cb5e...@joh.to

Looking at the thread, it seems like Marko lost interest for some
reason, and never submitted a revised patch.

I'm not really sure whether we'd want to support a nonstandard
syntax for this.  I can see that it'd have some usefulness for wide
tables, but is that enough of an argument to risk incompatibility
with future SQL-spec extensions?

Looking at the patch itself, I'd raise two major complaints:

* It looks like the only supported syntax is "INSERT ... SET
set_clause_list", which I guess is meant to be functionally
the same as INSERT ... VALUES with a single values list.
That's not terribly compelling.  I'd expect to be able to
use this syntax for multiple inserted rows.  Maybe allow
something like INSERT ... SET ... FROM ..., where the set-list
entries can use variables emitted by the FROM clause?

* If I'm reading it right, it blows off multiple-assignment
syntax -- that is, "SET (a,b,c) = row-valued-expr" -- with
the comment

+ * This is different from set_clause_list used in UPDATE because the SelectStmt
+ * syntax already does everything you might want to do in an in INSERT.

I'm unimpressed with that reasoning, because the SQL-standard
syntax already does everything you might want to do with this.

Since this patch was originally submitted, we sweated pretty
hard to upgrade our support of UPDATE's multiple-assignment
syntax so that it handles all interesting cases; so I'd want
INSERT ... SET to be fully on par with UPDATE ... SET if we
do it at all.

regards, tom lane

Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Simon Riggs

On Tue, 29 Jan 2019 at 09:55, Tom Lane  wrote:

> Simon Riggs  writes:
> > On Sun, 27 Jan 2019 at 19:17, Tom Lane  wrote:
> >> ... I don't
> >> know whether that would satisfy your concern, because I'm not clear
> >> on what your concern is.
>
> > To be able to extract indexable clauses where none existed before.
>
> That's a pretty vague statement, because it describes what I want
> to do perfectly, but this doesn't:
>
> > Hash functions assume that x = N => hash(x) = hash(N) AND x = N
> > so I want to be able to assume
> > x = K => f(x) = f(K) AND x = K
> > for specific f()
> > to allow indexable operations when we have an index on f(x) only
>
> The problem with that is that if the only thing that's in the query is
> "x = K" then there is nothing to cue the planner that it'd be worth
> expending cycles thinking about f(x).


I agree. That is the equivalent of a SeqScan; the wrong way to approach it.


> Sure, you could hang a planner
> support function on the equals operator that would go off and expend
> arbitrary amounts of computation looking for speculative matches ...
> but nobody is going to accept that as a patch, because the cost/benefit
> ratio is going to be awful for 99% of users.
>
> The mechanism I'm proposing is based on the thought that for
> specialized functions (or operators) like PostGIS' ST_Intersects(),
> it'll be worth expending extra cycles when one of those shows up
> in WHERE.  I don't think that scales to plain-vanilla equality though.
>
> Conceivably, you could turn that around and look for support functions
> attached to the functions/operators that are in an index expression,
> and give them the opportunity to derive lossy indexquals based on
> comparing the index expression to query quals.


That way around is the right way. If an index exists, explore whether it
can be used or not. If there are no indexes with appropriate support
functions, it will cost almost nothing to normal queries.

The problem of deriving potentially useful indexes is more expensive, I
understand.


> I have no particular
> interest in working on that right now, because it doesn't respond to
> what I understand PostGIS' need to be, and there are only so many
> hours in the day.  But maybe it could be made workable in the future.
>

I thought the whole exercise was about adding generic tools for everybody
to use. The Tom I've worked with for more than a few years would not have
said that; that is my normal line! You said PostGIS was looking to
"automatically convert WHERE clauses into lossy index quals." which sounds
very similar to what I outlined.

Either way, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Tom Lane

Simon Riggs  writes:
> On Sun, 27 Jan 2019 at 19:17, Tom Lane  wrote:
>> ... I don't
>> know whether that would satisfy your concern, because I'm not clear
>> on what your concern is.

> To be able to extract indexable clauses where none existed before.

That's a pretty vague statement, because it describes what I want
to do perfectly, but this doesn't:

> Hash functions assume that x = N => hash(x) = hash(N) AND x = N
> so I want to be able to assume
> x = K => f(x) = f(K) AND x = K
> for specific f()
> to allow indexable operations when we have an index on f(x) only

The problem with that is that if the only thing that's in the query is
"x = K" then there is nothing to cue the planner that it'd be worth
expending cycles thinking about f(x).  Sure, you could hang a planner
support function on the equals operator that would go off and expend
arbitrary amounts of computation looking for speculative matches ...
but nobody is going to accept that as a patch, because the cost/benefit
ratio is going to be awful for 99% of users.

The mechanism I'm proposing is based on the thought that for
specialized functions (or operators) like PostGIS' ST_Intersects(),
it'll be worth expending extra cycles when one of those shows up
in WHERE.  I don't think that scales to plain-vanilla equality though.

Conceivably, you could turn that around and look for support functions
attached to the functions/operators that are in an index expression,
and give them the opportunity to derive lossy indexquals based on
comparing the index expression to query quals.  I have no particular
interest in working on that right now, because it doesn't respond to
what I understand PostGIS' need to be, and there are only so many
hours in the day.  But maybe it could be made workable in the future.

regards, tom lane

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2019-01-28 Thread Takashi Menjo

Hi,

Peter Eisentraut wrote:
> When you manage the WAL (or perhaps in the future relation files)
> through PMDK, is there still a file system view of it somewhere, for
> browsing, debugging, and for monitoring tools?

First, I assume that our patchset is used with a filesystem that supports
direct access (DAX) feature, and I test it with ext4 on Linux.  You can cd
into pg_wal directory created by initdb -X pg_wal on such a filesystem, and
ls WAL segment files managed by PMDK at runtime.

For each PostgreSQL-specific tool, perhaps yes, but I have not tested yet.
At least, pg_waldump looks working as before.

Regards,
Takashi

-- 
Takashi Menjo - NTT Software Innovation Center

Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Simon Riggs

On Sun, 27 Jan 2019 at 19:17, Tom Lane  wrote:


> > * Allow a normal term to match a functional index, e.g. WHERE x =
> > 'abcdefgh' => WHERE substr(x, 1 , 5) = 'abcde' AND x = 'abcdefgh'
>
> I'm a bit confused about what you think this example means.  I do
> intend to work on letting extensions define rules for extracting
> index clauses from function calls, because that's the requirement
> that PostGIS is after in the thread that started this.  I don't
> know whether that would satisfy your concern, because I'm not clear
> on what your concern is.
>

To be able to extract indexable clauses where none existed before.

Hash functions assume that x = N => hash(x) = hash(N) AND x = N
so I want to be able to assume
x = K => f(x) = f(K) AND x = K
for specific f()
to allow indexable operations when we have an index on f(x) only

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2019-01-28 Thread Simon Riggs

On Mon, 28 Jan 2019 at 20:15, Robert Haas  wrote:

> On Fri, Jan 25, 2019 at 4:18 PM Alvaro Herrera 
> wrote:
> > Right, the planner/executor "disconnect" is one of the challenges, and
> > why I was trying to keep the old copy of the PartitionDesc around
> > instead of building updated ones as needed.
>
> I agree that would be simpler, but I just don't see how to make it
> work.  For one thing, keeping the old copy around can't work in
> parallel workers, which never had a copy in the first place.  For two
> things, we don't have a really good mechanism to keep the
> PartitionDesc that was used at plan time around until execution time.
> Keeping the relation open would do it, but I'm pretty sure that causes
> other problems; the system doesn't expect any residual references.
>
> I know you had a solution to this problem, but I don't see how it can
> work.  You said "Snapshots have their own cache (hash table) of
> partition descriptors. If a partdesc is requested and the snapshot has
> already obtained that partdesc, the original one is returned -- we
> don't request a new one from partcache."  But presumably this means
> when the last snapshot is unregistered, the cache is flushed
> (otherwise, when?) and if that's so then this relies on the snapshot
> that was used for planning still being around at execution time, which
> I am pretty sure isn't guaranteed.
>
> Also, and I think this point is worthy of some further discussion, the
> thing that really seems broken to me about your design is the idea
> that it's OK to use the query or transaction snapshot to decide which
> partitions exist.  The problem with that is that some query or
> transaction with an old snapshot might see as a partition some table
> that has been dropped or radically altered - different column types,
> attached to some other table now, attached to same table but with
> different bounds, or just dropped.  And therefore it might try to
> insert data into that table and fail in all kinds of crazy ways, about
> the mildest of which is inserting data that doesn't match the current
> partition constraint.  I'm willing to be told that I've misunderstood
> the way it all works and this isn't really a problem for some reason,
> but my current belief is that not only is it a problem with your
> design, but that it's such a bad problem that there's really no way to
> fix it and we have to abandon your entire approach and go a different
> route.  If you don't think that's true, then perhaps we should discuss
> it further.
>
> > > I suppose the reason for this is so
> > > that we don't have to go to the expense of copying the partition
> > > bounds from the PartitionDesc into the final plan, but it somehow
> > > seems a little scary to me.  Perhaps I am too easily frightened, but
> > > it's certainly a problem from the point of view of this project, which
> > > wants to let the PartitionDesc change concurrently.
> >
> > Well, my definition of the problem started with the assumption that we
> > would keep the partition array indexes unchanged, so "change
> > concurrently" is what we needed to avoid.  Yes, I realize that you've
> > opted to change that definition.
>
> I don't think I made a conscious decision to change this, and I'm kind
> of wondering whether I have missed some better approach here.  I feel
> like the direction I'm pursuing is an inevitable consequence of having
> no good way to keep the PartitionDesc around from plan-time to
> execution-time, which in turn feels like an inevitable consequence of
> the two points I made above: there's no guarantee that the plan-time
> snapshot is still registered anywhere by the time we get to execution
> time, and even if there were, the associated PartitionDesc may point
> to tables that have been drastically modified or don't exist any more.
> But it's possible that my chain-of-inevitable-consequences has a weak
> link, in which case I would surely like it if you (or someone else)
> would point it out to me.
>
> > I may have forgotten some of your earlier emails on this, but one aspect
> > (possibly a key one) is that I'm not sure we really need to cope, other
> > than with an ERROR, with queries that continue to run across an
> > attach/detach -- moreso in absurd scenarios such as the ones you
> > described where the detached table is later re-attached, possibly to a
> > different partitioned table.  I mean, if we can just detect the case and
> > raise an error, and this let us make it all work reasonably, that might
> > be better.
>
> Well, that's an interesting idea.  I assumed that users would hate
> that kind of behavior with a fiery passion that could never be
> quenched.  If not, then the problem changes from *coping* with
> concurrent changes to *detecting* concurrent changes, which may be
> easier, but see below.
>
> > I think detaching partitions concurrently is a necessary part of this
> > feature, so I would prefer not to go with a solution that works for
> > attaching partitions b

Re: pg_basebackup, walreceiver and wal_sender_timeout

2019-01-28 Thread Michael Paquier

On Mon, Jan 28, 2019 at 02:00:59PM +0100, Alex Kliukin wrote:
> While reading the doc page for the pg_basebackup, I've been confused
> by the fact that it says WAL files will be written to .tarballs
> (either base.tar or pg_wal.tar) when pg_basebackup is instructed to
> stream WALs alongside the backup itself. I think it makes sense to
> elaborate that it only happens when tar format is specified (doc
> patch is attached).

Agreed.  The current wording can be confusing depending on the format,
and your suggestion looks right.  Any opinions from others?
--
Michael


signature.asc
Description: PGP signature

Re: Built-in connection pooler

2019-01-28 Thread Michael Paquier

On Mon, Jan 28, 2019 at 10:33:06PM +0100, Dimitri Fontaine wrote:
> In other cases, it's important to measure and accept the possible
> performance cost of running a proxy server between the client connection
> and the PostgreSQL backend process. I believe the numbers shown in the
> previous email by Konstantin are about showing the kind of impact you
> can see when using the patch in a use-case where it's not meant to be
> helping much, if at all.

Have you looked at the possibility of having the proxy worker be
spawned as a background worker?  I think that we should avoid spawning
any new processes on the backend from the ground as we have a lot more
infrastructures since 9.3.  The patch should actually be bigger, the
code is very raw and lacks comments in a lot of areas where the logic
is not so obvious, except perhaps to the patch author.
--
Michael


signature.asc
Description: PGP signature

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-28 Thread Michael Paquier

On Tue, Jan 29, 2019 at 12:35:30AM +, Jamison, Kirk wrote:
> I just checked the patch.
> As per advice, you removed the versioning and specified --jobs. 
> The patch still applies, builds and passed the tests successfully.

I would document the optional VACUUM_OPTS on the page of pg_upgrade.
If Peter thinks it is fine to not do so, that's fine for me as well.

It seems to me that the latest patch sent is incorrect for multiple
reasons:
1) You still enforce -j to use the number of jobs that the caller of
pg_upgrade provides, and we agreed that both things are separate
concepts upthread, no?  What has been suggested by Alvaro is to add a
comment so as one can use VACUUM_OPTS with -j optionally, instead of
suggesting a full-fledged vacuumdb command which depends on what
pg_upgrade uses.  So there is no actual need for the if/else
complication business.
2) Perhaps we need to worry about the second vacuumdb --all command,
which may want custom options, which are not necessarily the same as
the options of the first command?  I don't think we need to care as it
applies only to an upgraded cluster using something older 8.4, just
wondering..
--
Michael


signature.asc
Description: PGP signature

Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Michael Paquier

On Mon, Jan 28, 2019 at 05:05:32PM -0500, Tom Lane wrote:
> Yeah, I thought about that too, but it doesn't seem like an improvement.
> If the query is very long (which isn't unlikely) I think people would
> prefer to see the option(s) up front.

Having these options at the front of the WITH clause looks more
natural to me.
--
Michael


signature.asc
Description: PGP signature

Re: pg_stat_ssl additions

2019-01-28 Thread Michael Paquier

On Tue, Jan 29, 2019 at 12:18:29PM +0900, Kyotaro HORIGUCHI wrote:
> At Mon, 28 Jan 2019 14:53:43 +0100, Peter Eisentraut 
>  wrote in 
> <24783370-5acd-e0f3-8eb7-7f42ff2a0...@2ndquadrant.com>
>> This is strange.  The tests work for me, and also on the cfbot.  The
> 
> Agreed. It seemed so also to me.

The tests look sane to me for what it's worth.

> When initializing SSL context, it picks up the root certificate
> from my home directory, not in test installation and I had one
> there. It is not based on $HOME but pwent so it is unchangeable
> (and it is the right design for the purpose).

Oops.  I agree that the tests ought to be as much isolated as
possible, without optionally fetching things depending on the user
environment.

> +# ssl-related properties may defautly set to the files in the users'
> +# environment. Explicitly provide them a value so that they don't
> +# point a valid file accidentially. Some other common properties are
> +# set here together.
> +# Attach this at the head of $common_connstr.
> +my $def_connstr = "user=ssltestuser dbname=trustdb sslcert=invalid 
> sslkey=invalid sslrootcert=invalid sslcrl=invalid ";
> +

Maybe it is better to just put that in test_connect_ok and
test_connect_fails for only the SSL parameters?  I don't see much
point in enforcing dbname and user.

>  # The server should not accept non-SSL connections.
>  test_connect_fails(
> @@ -185,7 +192,7 @@ test_connect_ok(
>  # Check that connecting with verify-full fails, when the hostname doesn't
>  # match the hostname in the server's certificate.
>  $common_connstr =
> -  "user=ssltestuser dbname=trustdb sslcert=invalid 
> sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR";
> +  $def_connstr."sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR";

I think this is bad, inconsistent style.  Adding the variable within
the quoted section is fine, as much as moving SERVERHOSTADDR out.  But
mixing styles is not.
--
Michael


signature.asc
Description: PGP signature

Re: static global variable openLogOff in xlog.c seems no longer used

2019-01-28 Thread Michael Paquier

On Tue, Jan 29, 2019 at 11:49:37AM +0900, Takashi Menjo wrote:
> Because of pg_pwrite()[1], openLogOff, a static global variable in xlog.c,
> seems taken over by a local variable startoffset and no longer used
> now.

It seems to me that keeping openLogOff is still useful to get a report
about the full chunk area being written if the data gets written in
multiple chunks and fails afterwards.  Your patch would modify the
report so as only the area with the partial write is reported.  For
debugging, having a static reference is also useful in my opinion.
--
Michael


signature.asc
Description: PGP signature

Follow-up on INSERT INTO ... SET ...

2019-01-28 Thread Sven Berkvens-Matthijsse


Hi all,

In 2016, a thread was started about implementing INSERT INTO ... SET ... 
that included a patch and was basically ready for inclusion in 
PostgreSQL. However, it seems as though it stagnated for some reason. 
Does anybody remember this and is there perhaps someone who knows what 
the current status is? If nobody is working on this any longer, I'd be 
willing to try to revive the patch for the current code base.


The thread that I'm talking about can be found at: 
https://www.postgresql.org/message-id/flat/709e06c0-59c9-ccec-d216-21e38cb5e...@joh.to


--
With kind regards,
Sven Berkvens-Matthijsse

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-28 Thread Tom Lane

Andres Freund  writes:
> I did that now. I couldn't reproduce it locally, despite a lot of
> runs. Looking at the buildfarm it looks like the failures were,
> excluding handfish which failed without recognizable symptoms before and
> after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> is interesting.

Isn't it now.  Something about the BSD scheduler perhaps?  But we've
got four or five different BSD-ish platforms that reported failures,
and it's hard to believe they've all got identical schedulers.

That second handfish failure does match the symptoms elsewhere:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22

--- 
/home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
2018-10-30 20:11:45.551967381 -0300
+++ 
/home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
 2019-01-28 22:38:20.614211568 -0200
@@ -0,0 +1,20 @@
+SQL error: page 0 of relation "test_thread" should be empty but is not on line 
125

so it's not quite 100% BSD, but certainly the failure rate on BSD is
way higher than elsewhere.  Puzzling.

regards, tom lane

Re: postgres_fdw: estimate_path_cost_size fails to re-use cached costs

2019-01-28 Thread Etsuro Fujita

(2019/01/28 19:37), Etsuro Fujita wrote:
> (2019/01/25 20:33), Etsuro Fujita wrote:
>> I noticed yet another thing while updating the patch for pushing down
>> ORDER BY LIMIT.  Let me explain.  When costing foreign paths on the
>> basis of local statistics, we calculate/cache the costs of an unsorted
>> foreign path, and re-use them to estimate the costs of presorted foreign
>> paths, as shown below.  BUT: we fail to re-use them for some typical
>> queries, such as "select * from ft1 order by a", due to
>> fpinfo->rel_startup_cost=0, leading to doing the same cost calculation
>> repeatedly.
>>
>>   /*
>>* We will come here again and again with different set of pathkeys
>>* that caller wants to cost. We don't need to calculate the cost 
>> of
>>* bare scan each time. Instead, use the costs if we have cached
>> them
>>* already.
>>*/
>>   if (fpinfo->rel_startup_cost>   0&&   fpinfo->rel_total_cost>   0)
>>   {
>>   startup_cost = fpinfo->rel_startup_cost;
>>   run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
>>   }
>>
>> I think we should use "fpinfo->rel_startup_cost>= 0" here, not
>> "fpinfo->rel_startup_cost>   0".  Also, it would be possible that the
>> total cost calculated is zero in corner cases (eg, seq_page_cost=0 and
>> cpu_tuple_cost=0 for the example), so I think we should change the total
>> cost part as well.  Attached is a patch for that.
> 
> I added the commit message.  Updated patch attached.  If no objections,
> I'll apply that to HEAD only as there are no reports of actual trouble
> from this, as far as I know.

Pushed.

Best regards,
Etsuro Fujita

Re: "repliation" as database name

2019-01-28 Thread Kyotaro HORIGUCHI

At Mon, 28 Jan 2019 17:30:57 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190128.173057.41178374.horiguchi.kyot...@lab.ntt.co.jp>
> At Wed, 26 Dec 2018 12:59:32 -0500, Tom Lane  wrote in 
> <32289.1545847...@sss.pgh.pa.us>
> > Hm, I agree that the para doesn't read very well now, but I think this
> > could be improved further.  How about something like
> > 
> > # DATABASE can be "all", "sameuser", "samerole", "replication", a
> > # database name, or a comma-separated list thereof.  The "replication"
> > # keyword matches replication connection requests (see example below).
> > # The "all" keyword matches all database names, but not replication
> > # connections.
> 
> I'm afraid that just dropping "it must be enabled in a separate
> record" leads to confusion. How about adding a comment to
> replication connection examples.
> 
> # Allow replication connections from localhost, by a user with the
> # replication privilege. Each definition must have its own record.

Mmm, this doesn't seem to saying what I wanted to say there.
This seems better.

# Allow replication connections from localhost, by a user with
# the replication privilege. They must have separate records from
# non-replication connections.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 16:40:36 -0800, Andres Freund wrote:
> On 2019-01-28 15:49:33 -0800, Andres Freund wrote:
> > On 2019-01-28 18:08:59 -0500, Tom Lane wrote:
> > > Andres Freund  writes:
> > > > I'm inclined to put back the
> > > >LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> > > >LockRelationForExtension(onerel, ExclusiveLock);
> > > >UnlockRelationForExtension(onerel, ExclusiveLock);
> > > >LockBufferForCleanup(buf);
> > > >if (PageIsNew(page))
> > > > dance regardless, just to get the buildfarm to green?
> > > 
> > > The buildfarm's got half a dozen reports now of a failure of this ilk,
> > > so you'd better do something.
> > 
> > Yea, I was working on a patch. Was trying to come up with an explanation
> > of how this can be realistically hit on the BF, but failed.  I've pushed
> > something now, let's see whether that fixes it.
> 
> It has not. Given that I don't understand what's happening here I'm
> going to revert both commits unless I figure it out in the next ~30min.

I did that now. I couldn't reproduce it locally, despite a lot of
runs. Looking at the buildfarm it looks like the failures were,
excluding handfish which failed without recognizable symptoms before and
after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
is interesting. I asked Thomas Munro whether he could run on freebsd,
and he gave me a login, where I just now reproduced the issue (2 of 5
make check runs failing).

Greetings,

Andres Freund

Re: pg_stat_ssl additions

2019-01-28 Thread Kyotaro HORIGUCHI

At Mon, 28 Jan 2019 14:53:43 +0100, Peter Eisentraut 
 wrote in 
<24783370-5acd-e0f3-8eb7-7f42ff2a0...@2ndquadrant.com>
> On 28/01/2019 09:14, Kyotaro HORIGUCHI wrote:
> > 0002:
> > 
> >  The test 54-56 of 001_ssltest.pl failed, which succeeded before
> >  applying 0002. Seems to need to use another user.
> > 
> > #   Failed test 'pg_stat_ssl view without client certificate: no stderr'
> > #   at t/001_ssltests.pl line 313.
> > #  got: 'psql: SSL error: certificate verify failed
> > # FATAL:  no pg_hba.conf entry for host "127.0.0.1", user "ssltestuser", 
> > database "trustdb", SSL off
> > # '
> > 
> > If this is not specific to my environment, the connevcion string
> > at line 313 of 001_ssltests.pl needs sslrootcert setting (, which
> > is feeded to test_connect_ok/fails() via $connstr, not via
> > $common_connstr).
> 
> This is strange.  The tests work for me, and also on the cfbot.  The

Agreed. It seemed so also to me.

> pg_hba.conf method is "trust", and there is nothing that should make it
> do certificate verification for this test.  Do you have have any PGSSL*
> environment variables set perhaps?  An interesting OpenSSL version or
> configuration perhaps?

Some further investigation told me that the file
~/.postgresql/root.cert was the culprit.

When initializing SSL context, it picks up the root certificate
from my home directory, not in test installation and I had one
there. It is not based on $HOME but pwent so it is unchangeable
(and it is the right design for the purpose).

sslcert, sslkey, sslrootcert and sslcrl are in the same
characteristic so they should be set to invalid value (namely
"invalid") if not used.

The attached diff file on top of 0002 adds a new variable
$def_connstr for the properties above and some other variables,
then uses it as the first part of $common_connstr.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index 2bcbb1b42a..aa0692cc47 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -25,6 +25,13 @@ my $SERVERHOSTADDR = '127.0.0.1';
 # Allocation of base connection string shared among multiple tests.
 my $common_connstr;
 
+# ssl-related properties may defautly set to the files in the users'
+# environment. Explicitly provide them a value so that they don't
+# point a valid file accidentially. Some other common properties are
+# set here together.
+# Attach this at the head of $common_connstr.
+my $def_connstr = "user=ssltestuser dbname=trustdb sslcert=invalid sslkey=invalid sslrootcert=invalid sslcrl=invalid ";
+
 # The client's private key must not be world-readable, so take a copy
 # of the key stored in the code tree and update its permissions.
 copy("ssl/client.key", "ssl/client_tmp.key");
@@ -93,7 +100,7 @@ note "running client tests";
 switch_server_cert($node, 'server-cn-only');
 
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid hostaddr=$SERVERHOSTADDR host=common-name.pg-ssltest.test";
+  $def_connstr."hostaddr=$SERVERHOSTADDR host=common-name.pg-ssltest.test";
 
 # The server should not accept non-SSL connections.
 test_connect_fails(
@@ -185,7 +192,7 @@ test_connect_ok(
 # Check that connecting with verify-full fails, when the hostname doesn't
 # match the hostname in the server's certificate.
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR";
+  $def_connstr."sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR";
 
 test_connect_ok(
 	$common_connstr,
@@ -205,7 +212,7 @@ test_connect_fails(
 switch_server_cert($node, 'server-multiple-alt-names');
 
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR sslmode=verify-full";
+  $def_connstr."sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR sslmode=verify-full";
 
 test_connect_ok(
 	$common_connstr,
@@ -236,7 +243,7 @@ test_connect_fails(
 switch_server_cert($node, 'server-single-alt-name');
 
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR sslmode=verify-full";
+  $def_connstr."sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR sslmode=verify-full";
 
 test_connect_ok(
 	$common_connstr,
@@ -260,7 +267,7 @@ test_connect_fails(
 switch_server_cert($node, 'server-cn-and-alt-names');
 
 $common_connstr =
-  "user=ssltestuser dbname=trustdb sslcert=invalid sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR sslmode=verify-full";
+  $def_connstr."sslrootcert=ssl/root+server_ca.crt hostaddr=$SERVERHOSTADDR sslmode=verify-full";
 
 test_connect_ok(
 	$common_connstr,
@@ -280,7 +287,7 @@ test_connect_fails(
 # not a very sensible certificate, but libpq should handle it gracefully.
 switch_server_cert($node, 'server-no-names');
 $common_connstr =
-  "user=ssltestuser dbname=trustdb ssl

Re: Header checking failures on LLVM-less machines

2019-01-28 Thread Tom Lane

Andres Freund  writes:
> I don't think we are actually contradicting each other. The aim of that
> error was to prevent pieces of code that aren't conditional on
> --with-llvm from including llvmjit.h.  I was, for a moment and wrongly,
> thinking that we could style the header in a way that'd make it both for
> safe for cpluspluscheck and still error in that case, but that was a
> brainfart.  But even leaving that brainfart aside, I'm not arguing
> against adding those include guards, so ...?

Works for me now.  Thanks for fixing.

regards, tom lane

static global variable openLogOff in xlog.c seems no longer used

2019-01-28 Thread Takashi Menjo

Hi,

Because of pg_pwrite()[1], openLogOff, a static global variable in xlog.c,
seems taken over by a local variable startoffset and no longer used now.

I write the attached patch that removes openLogOff. Both "make check" and
"make installcheck" passed, and just after that, "pg_ctl -m immediate stop"
then "pg_ctl start" looked OK.

Regards,
Takashi

[1] See commit c24dcd0cfd949bdf245814c4c2b3df828ee7db36.

-- 
Takashi Menjo - NTT Software Innovation Center




Remove-openLogOff.patch
Description: Binary data

Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Andreas Karlsson


On 1/28/19 10:54 PM, Peter Eisentraut wrote:

Or put it at the end?

 WITH ctename AS ( query ) MATERIALIZED


Hm, seems like that would be easy to miss for long queries.

Andreas

Re: A few new options for vacuumdb

2019-01-28 Thread Michael Paquier

On Mon, Jan 28, 2019 at 09:58:17PM +, Bossart, Nathan wrote:
> Yes, this simplifies the code quite a bit.  I did it this way in
> v6.

Thanks for the quick update.  Things could have been made a bit more
simple by just using a for loop instead of while, even if the table
list can be NULL for database-wide operations (perhaps that's a matter
of taste..).  prepare_vacuum_command() could be simplified further by
using the server version instead of the full connection string.  The
comments at the top of the routine were not properly updated either,
and the last portion appending the relation name just needs one
appendPQExpBuffer() call instead of three separate calls.  PQclear()
could have been moved closer to where all the query results have been
consumed, to make the whole more consistent.

Anyway, patches 1 and 2 have been merged, and committed after some
cleanup and adjustments.  Patch 3 gets much easier now.

-" ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n");
+" ON c.relnamespace OPERATOR(pg_catalog.=) ns.oid\n"
+" LEFT JOIN pg_catalog.pg_class t"
+" ON c.reltoastrelid OPERATOR(pg_catalog.=) t.oid\n");
Why do need this part?
--
Michael

signature.asc
Description: PGP signature

Re: Covering GiST indexes

2019-01-28 Thread Andreas Karlsson


On 1/28/19 7:26 PM, Andrey Borodin wrote:

* I am no fan of the tupdesc vs truncTupdesc separation and think that it is a 
potential hazard, but I do not have any better suggestion right now.

B-tree is copying tupdesc every time they truncate tuple. We need tuple 
truncation a little more often: when we are doing page split, we have to form 
all page tuples, truncated.
Initially, I've optimized only this case, but this led to prepared tupledesc 
for truncated tuples.


* There is no test case for exclusion constraints, and I feel since that is one 
of the more important uses we should probably have at least one such test case.


Actually, I did not understand this test case. Can you elaborate more on this? 
How included attributes should participate in exclude index? What for?


I mean include a table like below among the tests. I feel like this is a 
main use case for INCLUDE.


CREATE TABLE t2 (
  x int4range,
  y int,

  EXCLUDE USING gist (x WITH &&) INCLUDE (y)
);

* Why the random noise in the diff below? I think using "(c3) INCLUDE (c4)" for 
both gist and rtree results in a cleaner patch.

I've used columns with and without opclass in INCLUDE. This led to these 
seemingly random changes.


I mean the diff would be smaller as the below. It also may make sense to 
make both lines "(c3) INCLUDE (c1, c4)".


 CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
 CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
 CREATE INDEX on tbl USING gist(c3) INCLUDE (c4);
 CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
 CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
 CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
-CREATE INDEX on tbl USING rtree(c1, c2) INCLUDE (c3, c4);
+CREATE INDEX on tbl USING rtree(c3) INCLUDE (c4);
 CREATE INDEX on tbl USING btree(c1, c2) INCLUDE (c3, c4);

Andreas

Re: jsonpath

2019-01-28 Thread Tom Lane

Andres Freund  writes:
> And even if you, to address Tom's point about plpgsql, had a category
> that could only be thrown by core code,

Actually, that wasn't quite what my point was there.  Even if the error
that is thrown is perfectly safe and was thrown from a known spot in the
core code, elog.c will call ErrorContextCallback functions on the way out,
*before* longjmp'ing back to your code.  So, if PL foo executes a SQL
command that includes jsonpath operations, it's possible for PL foo's
error context reporting code to break whatever assumptions you thought
were safe to make.

Admittedly, a wise PL designer would refrain from assuming too much about
what they can get away with doing in an error context callback.  But
that very same caution should persuade us not to assume too much about
what actually has happened during error processing.

regards, tom lane

Re: speeding up planning with partitions

2019-01-28 Thread David Rowley

On Mon, 28 Jan 2019 at 21:21, Amit Langote
 wrote:
> Thanks for the fix, I'll incorporate it in the next version I'll post by
> tomorrow.

I just started reading 0001 again. I made a few notes:

1. Should bms_del_members() instead of bms_difference() if you don't
mind modifying in place, which it sounds like you don't, going by the
comment.

/*
* Now fix up EC's relids set.  It's OK to modify EC like this,
* because caller must have made a copy of the original EC.
* For example, see adjust_inherited_target_child_root.
*/
cur_ec->ec_relids = bms_difference(cur_ec->ec_relids,
   parent_rel->relids);
cur_ec->ec_relids = bms_add_members(cur_ec->ec_relids,
child_rel->relids);

2.  In the comment:

/*
* Now fix up EC's relids set.  It's OK to modify EC like this,
* because caller must have made a copy of the original EC.
* For example, see adjust_inherited_target_child_root.
*/

You mention that the caller must have made a copy, but the header
comment mentions nothing about that to warn callers.

3. Would it be better to do the following in make_rel_from_joinlist()?

/*
* Check that we got at least one usable path.  In the case of an
* inherited update/delete operation, no path has been created for
* the query's actual target relation yet.
*/
if (!root->inherited_update &&
(!final_rel ||
!final_rel->cheapest_total_path ||
final_rel->cheapest_total_path->param_info != NULL))
elog(ERROR, "failed to construct the join relation");

That way the test is also performed for each partition's join problem
and you don't need that weird special case to disable the check.

4. Do you think it would be nicer if
inheritance_make_rel_from_joinlist returned rel?

inheritance_make_rel_from_joinlist(root, joinlist);

/*
* Return the RelOptInfo of original target relation, although this
* doesn't really contain the final path.  inheritance_planner
* from where we got here will generate the final path, but it will
* do so by iterative over child subroots, not through this
* RelOptInfo.
*/
rel = find_base_rel(root, root->parse->resultRelation);

5. "iterative over child subroots" -> "iterating over the child subroots".

6. In set_inherit_target_rel_sizes() the
!bms_is_member(appinfo->child_relid, live_children) check could be
moved much earlier, likely just after the if (appinfo->parent_relid !=
parentRTindex) check.

I understand you're copying set_append_rel_size() here, but I don't
quite understand the reason why that function is doing it that way.
Seems like wasted effort building the quals for pruned partitions.

7. In set_inherit_target_rel_sizes() I still don't really like the way
you're adjusting the EquivalenceClasses. Would it not be better to
invent a function similar to adjust_appendrel_attrs(), or just use
that function?

8. Still a typo in this comment:

+ * ass dummy.  We must do this in this phase so that the rel's

9. "parent's" -> "the parent's"

/* Also propagate this child's own children into parent's list. */

10. Too many "of":

 * For each child of of the query's result relation, this translates the

11. Badly formed multiline comment.

/* Save the just translated targetlist as unexpanded_tlist in the child's
* subroot, so that this child's own children can use it.  Must use copy

12. DEELETE -> DELETE

/*
* The following fields are set during query planning portion of an
* inherited UPDATE/DEELETE operation.
*/

13. Surely this is not true?

* non-NULL.  Content of each PlannerInfo is same as the parent
* PlannerInfo, except for the parse tree which is a translated copy of
* the parent's parse tree.

Are you not modifying the EquivalenceClasses?

14. This comment plus the variable name is confusing me:

/*
* RelOptInfos corresponding to each child target rel.  For leaf children,
* it's the RelOptInfo representing the output of make_rel_from_joinlist()
* called with the parent rel in the original join tree replaced by a
* given leaf child.  For non-leaf children, it's the baserel RelOptInfo
* itself, left as a placeholder.
*/
List*inh_target_child_joinrels;

The variable name mentions joinrels, but the comment target rels.  A
join rel can't be the target of an UPDATE/DELETE.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: jsonpath

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 17:31:15 -0800, Andres Freund wrote:
> On 2019-01-29 04:17:33 +0300, Alexander Korotkov wrote:
> > I'm probably not yet understanding all the risks this code have.  So far I 
> > see:
> 
> I find these *more* than sufficient to not go to the PG_TRY/CATCH
> approach.
> 
> 
> > 1) One of functions called here performs database modification, while
> > it wasn't suppose to.  So, it becomes not safe to skip subtransaction.
> 
> It's not just data modifications. Even just modifying some memory
> structures that'd normally be invalidated by an xact abort's
> invalidation processing isn't safe.
> 
> 
> > 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason.  So, it
> > might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore.
> 
> It'd e.g. not surprise me very much if some OOM would end up translating
> to ERRCODE_DATA_EXCEPTION, because some library function returned an
> error due to ENOMEM.
> 
> 
> > Could you complete this list?
> 
> 3) The expression changed the current expression context, GUCs or any
>other such global variable. Without a proper subtrans reset this
>state isn't reverted.
> 4) The function acquires an LWLOCK, buffer reference, anything resowner
>owned. Skipping subtrans reset, that's not released in that
>moment. That's going to lead to potential hard deadlocks.
> 99) sigsetjmp is actually pretty expensive.

And even if you, to address Tom's point about plpgsql, had a category
that could only be thrown by core code, and addressed 3) and 4) by
carefully and continuously auditing code, you'd still have the issue of

5) you'd likely leak memory at potentially prodiguous rate...

Greetings,

Andres Freund

Re: jsonpath

2019-01-28 Thread Tom Lane

Alexander Korotkov  writes:
> + if (jspThrowErrors(cxt) ||
> + ERRCODE_TO_CATEGORY(errcode) != ERRCODE_DATA_EXCEPTION)
> + PG_RE_THROW();

> BTW, this code also looks... useless.  I can't see whole numeric.c
> throwing ERRCODE_DATA_EXCEPTION.   Nikita, what do we mean here?

I think what this code is claiming is that any errcode in the 22xxx
category is safe to trap.  Presumably what it's expecting to see is
ERRCODE_DIVISION_BY_ZERO (22012) or another error that numeric.c
could throw ... but 22xxx covers a heck of a lot of ground.

regards, tom lane

Re: jsonpath

2019-01-28 Thread Alexander Korotkov

On Tue, Jan 29, 2019 at 4:44 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > On Tue, Jan 29, 2019 at 4:03 AM Andres Freund  wrote:
> >> FWIW, I still think this is a terrible idea and shouldn't be merged this
> >> way. The likelihood of introducing subtle bugs seems way too high - even
> >> if it's possibly not buggy today, who says that it's not going to be in
> >> the future?
>
> > I'm probably not yet understanding all the risks this code have.  So far I 
> > see:
> > 1) One of functions called here performs database modification, while
> > it wasn't suppose to.  So, it becomes not safe to skip subtransaction.
> > 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason.  So, it
> > might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore.
> > Could you complete this list?
>
> Sure: every errcode we have is unsafe to treat this way.
>
> The backend coding rule from day one has been that a thrown error requires
> (sub)transaction cleanup to be done to make sure that things are back in a
> good state.  You can *not* just decide that it's okay to ignore that,
> especially not when invoking code outside the immediate area of what
> you're doing.
>
> As a counterexample, for any specific errcode you might claim is safe,
> a plpgsql function might do something that requires cleanup (which is
> not equal to "does data modification") and then throw that errcode.
>
> You might argue that this code will only ever be used to call certain
> functions in numeric.c and you've vetted every line of code that those
> functions can reach ... but even to say that is to expose how fragile
> the argument is.  Every time anybody touched numeric.c, or any code
> reachable from there, we'd have to wonder whether we just introduced
> a failure hazard for jsonpath.  That's not maintainable.  It's even
> less maintainable if anybody decides they'd like to insert extension
> hooks into any of that code.  I rather imagine that this could be
> broken today by an ErrorContextCallback function, for example.
> (That is, even today, "vetting every line" has to include vetting every
> errcontext subroutine in the system, along with everything it can call.
> Plus equivalent code in external PLs and so on.)
>
> We've rejected code like this every time it was submitted to date,
> and I don't see a reason to change that policy for jsonpath.

Tom, Andres thank you for the explanation.  More than clear for now.
Will search for workaround.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

2019-01-28 Thread Alexander Korotkov

On Tue, Jan 29, 2019 at 4:30 AM Alexander Korotkov
 wrote:
> On Tue, Jan 29, 2019 at 4:03 AM Andres Freund  wrote:
> > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > > + /*
> > > +  * It is safe to use here PG_TRY/PG_CATCH without subtransaction 
> > > because
> > > +  * no function called inside performs data modification.
> > > +  */
> > > + PG_TRY();
> > > + {
> > > + res = DirectFunctionCall2(func, ldatum, rdatum);
> > > + }
> > > + PG_CATCH();
> > > + {
> > > + int errcode = geterrcode();
> > > +
> > > + if (jspThrowErrors(cxt) ||
> > > + ERRCODE_TO_CATEGORY(errcode) != 
> > > ERRCODE_DATA_EXCEPTION)
> > > + PG_RE_THROW();
> > > +
> > > + MemoryContextSwitchTo(mcxt);
> > > + FlushErrorState();
> > > +
> > > + return jperError;
> > > + }
> > > + PG_END_TRY();
>
> BTW, this code also looks... useless.  I can't see whole numeric.c
> throwing ERRCODE_DATA_EXCEPTION.   Nikita, what do we mean here?

Oh, sorry.  I've missed we have ERRCODE_TO_CATEGORY() here.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

2019-01-28 Thread Tom Lane

Alexander Korotkov  writes:
> On Tue, Jan 29, 2019 at 4:03 AM Andres Freund  wrote:
>> FWIW, I still think this is a terrible idea and shouldn't be merged this
>> way. The likelihood of introducing subtle bugs seems way too high - even
>> if it's possibly not buggy today, who says that it's not going to be in
>> the future?

> I'm probably not yet understanding all the risks this code have.  So far I 
> see:
> 1) One of functions called here performs database modification, while
> it wasn't suppose to.  So, it becomes not safe to skip subtransaction.
> 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason.  So, it
> might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore.
> Could you complete this list?

Sure: every errcode we have is unsafe to treat this way.

The backend coding rule from day one has been that a thrown error requires
(sub)transaction cleanup to be done to make sure that things are back in a
good state.  You can *not* just decide that it's okay to ignore that,
especially not when invoking code outside the immediate area of what
you're doing.

As a counterexample, for any specific errcode you might claim is safe,
a plpgsql function might do something that requires cleanup (which is
not equal to "does data modification") and then throw that errcode.

You might argue that this code will only ever be used to call certain
functions in numeric.c and you've vetted every line of code that those
functions can reach ... but even to say that is to expose how fragile
the argument is.  Every time anybody touched numeric.c, or any code
reachable from there, we'd have to wonder whether we just introduced
a failure hazard for jsonpath.  That's not maintainable.  It's even
less maintainable if anybody decides they'd like to insert extension
hooks into any of that code.  I rather imagine that this could be
broken today by an ErrorContextCallback function, for example.
(That is, even today, "vetting every line" has to include vetting every
errcontext subroutine in the system, along with everything it can call.
Plus equivalent code in external PLs and so on.)

We've rejected code like this every time it was submitted to date,
and I don't see a reason to change that policy for jsonpath.

regards, tom lane

RE: [HACKERS] Cached plans and statement generalization

2019-01-28 Thread Nagaura, Ryohei

Hi,

Although I became your reviewer, it seems to be difficult to feedback in this 
CF.
I continue to review, so would you update your patch please?
Until then I review your current patch.

There is one question.
date_1.out which maybe is copy of date.out includes trailing space and gaps of 
indent
e.g., line 3368 and 3380 in your current patch have 
space in each end of line
different indent.
This is also seen in date.out.
I'm not sure whether it is ok to add new file including the above features just 
because a existing file includes.
Is it ok?

Best regards,
-
Ryohei Nagaura

Re: Header checking failures on LLVM-less machines

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 20:21:49 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-01-28 19:51:22 -0500, Tom Lane wrote:
> >> I propose that we should fix this by making the whole bodies of those
> >> two headers be #ifdef USE_LLVM.
> 
> > Hm, I think it's sufficient that we error out if llvm was configured,
> > we've sufficient buildfarm machines running with it enabled.
> 
> That is not the point.  The point is that you've broken a developer
> tool.  I don't really care whether cpluspluscheck would work on
> some subset of buildfarm machines --- what I need is for it to work
> on *my* machine.  It is not any more okay for the llvm headers to fail
> like this than it would be for libxml- or openssl- or Windows-dependent
> headers to fail on machines lacking respective infrastructure.
> 
> (And yes, I realize that that means that cpluspluscheck can only
> check a subset of the header declarations on any particular machine.
> But there's no way around that.)

I don't think we are actually contradicting each other. The aim of that
error was to prevent pieces of code that aren't conditional on
--with-llvm from including llvmjit.h.  I was, for a moment and wrongly,
thinking that we could style the header in a way that'd make it both for
safe for cpluspluscheck and still error in that case, but that was a
brainfart.  But even leaving that brainfart aside, I'm not arguing
against adding those include guards, so ...?

I think the raison d'etre for that error has shrunk considerably anyway
- it was introduced before the LLVM including/linking code was separated
into it's own .so / directory.

Greetings,

Andres Freund

Re: jsonpath

2019-01-28 Thread Andres Freund

On 2019-01-29 04:17:33 +0300, Alexander Korotkov wrote:
> On Tue, Jan 29, 2019 at 4:03 AM Andres Freund  wrote:
> > On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > > + /*
> > > +  * It is safe to use here PG_TRY/PG_CATCH without subtransaction 
> > > because
> > > +  * no function called inside performs data modification.
> > > +  */
> > > + PG_TRY();
> > > + {
> > > + res = DirectFunctionCall2(func, ldatum, rdatum);
> > > + }
> > > + PG_CATCH();
> > > + {
> > > + int errcode = geterrcode();
> > > +
> > > + if (jspThrowErrors(cxt) ||
> > > + ERRCODE_TO_CATEGORY(errcode) != 
> > > ERRCODE_DATA_EXCEPTION)
> > > + PG_RE_THROW();
> > > +
> > > + MemoryContextSwitchTo(mcxt);
> > > + FlushErrorState();
> > > +
> > > + return jperError;
> > > + }
> > > + PG_END_TRY();
> >
> > FWIW, I still think this is a terrible idea and shouldn't be merged this
> > way. The likelihood of introducing subtle bugs seems way too high - even
> > if it's possibly not buggy today, who says that it's not going to be in
> > the future?
> 
> I'm probably not yet understanding all the risks this code have.  So far I 
> see:

I find these *more* than sufficient to not go to the PG_TRY/CATCH
approach.

> 1) One of functions called here performs database modification, while
> it wasn't suppose to.  So, it becomes not safe to skip subtransaction.

It's not just data modifications. Even just modifying some memory
structures that'd normally be invalidated by an xact abort's
invalidation processing isn't safe.

> 2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason.  So, it
> might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore.

It'd e.g. not surprise me very much if some OOM would end up translating
to ERRCODE_DATA_EXCEPTION, because some library function returned an
error due to ENOMEM.

> Could you complete this list?

3) The expression changed the current expression context, GUCs or any
   other such global variable. Without a proper subtrans reset this
   state isn't reverted.
4) The function acquires an LWLOCK, buffer reference, anything resowner
   owned. Skipping subtrans reset, that's not released in that
   moment. That's going to lead to potential hard deadlocks.
99) sigsetjmp is actually pretty expensive.

Greetings,

Andres Freund

Re: jsonpath

2019-01-28 Thread Alexander Korotkov

On Tue, Jan 29, 2019 at 4:03 AM Andres Freund  wrote:
> On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > + /*
> > +  * It is safe to use here PG_TRY/PG_CATCH without subtransaction 
> > because
> > +  * no function called inside performs data modification.
> > +  */
> > + PG_TRY();
> > + {
> > + res = DirectFunctionCall2(func, ldatum, rdatum);
> > + }
> > + PG_CATCH();
> > + {
> > + int errcode = geterrcode();
> > +
> > + if (jspThrowErrors(cxt) ||
> > + ERRCODE_TO_CATEGORY(errcode) != 
> > ERRCODE_DATA_EXCEPTION)
> > + PG_RE_THROW();
> > +
> > + MemoryContextSwitchTo(mcxt);
> > + FlushErrorState();
> > +
> > + return jperError;
> > + }
> > + PG_END_TRY();

BTW, this code also looks... useless.  I can't see whole numeric.c
throwing ERRCODE_DATA_EXCEPTION.   Nikita, what do we mean here?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Header checking failures on LLVM-less machines

2019-01-28 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-28 19:51:22 -0500, Tom Lane wrote:
>> I propose that we should fix this by making the whole bodies of those
>> two headers be #ifdef USE_LLVM.

> Hm, I think it's sufficient that we error out if llvm was configured,
> we've sufficient buildfarm machines running with it enabled.

That is not the point.  The point is that you've broken a developer
tool.  I don't really care whether cpluspluscheck would work on
some subset of buildfarm machines --- what I need is for it to work
on *my* machine.  It is not any more okay for the llvm headers to fail
like this than it would be for libxml- or openssl- or Windows-dependent
headers to fail on machines lacking respective infrastructure.

(And yes, I realize that that means that cpluspluscheck can only
check a subset of the header declarations on any particular machine.
But there's no way around that.)

regards, tom lane

Re: jsonpath

2019-01-28 Thread Alexander Korotkov

On Tue, Jan 29, 2019 at 4:03 AM Andres Freund  wrote:
> On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> > + /*
> > +  * It is safe to use here PG_TRY/PG_CATCH without subtransaction 
> > because
> > +  * no function called inside performs data modification.
> > +  */
> > + PG_TRY();
> > + {
> > + res = DirectFunctionCall2(func, ldatum, rdatum);
> > + }
> > + PG_CATCH();
> > + {
> > + int errcode = geterrcode();
> > +
> > + if (jspThrowErrors(cxt) ||
> > + ERRCODE_TO_CATEGORY(errcode) != 
> > ERRCODE_DATA_EXCEPTION)
> > + PG_RE_THROW();
> > +
> > + MemoryContextSwitchTo(mcxt);
> > + FlushErrorState();
> > +
> > + return jperError;
> > + }
> > + PG_END_TRY();
>
> FWIW, I still think this is a terrible idea and shouldn't be merged this
> way. The likelihood of introducing subtle bugs seems way too high - even
> if it's possibly not buggy today, who says that it's not going to be in
> the future?

I'm probably not yet understanding all the risks this code have.  So far I see:
1) One of functions called here performs database modification, while
it wasn't suppose to.  So, it becomes not safe to skip subtransaction.
2) ERRCODE_DATA_EXCEPTION was thrown for unexpected reason.  So, it
might appear that ERRCODE_DATA_EXCEPTION is not safe to ignore.
Could you complete this list?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonpath

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-29 04:00:19 +0300, Alexander Korotkov wrote:
> + /*
> +  * It is safe to use here PG_TRY/PG_CATCH without subtransaction because
> +  * no function called inside performs data modification.
> +  */
> + PG_TRY();
> + {
> + res = DirectFunctionCall2(func, ldatum, rdatum);
> + }
> + PG_CATCH();
> + {
> + int errcode = geterrcode();
> +
> + if (jspThrowErrors(cxt) ||
> + ERRCODE_TO_CATEGORY(errcode) != ERRCODE_DATA_EXCEPTION)
> + PG_RE_THROW();
> +
> + MemoryContextSwitchTo(mcxt);
> + FlushErrorState();
> +
> + return jperError;
> + }
> + PG_END_TRY();

FWIW, I still think this is a terrible idea and shouldn't be merged this
way. The likelihood of introducing subtle bugs seems way too high - even
if it's possibly not buggy today, who says that it's not going to be in
the future?

Greetings,

Andres Freund

Re: Header checking failures on LLVM-less machines

2019-01-28 Thread Andres Freund

On 2019-01-28 19:51:22 -0500, Tom Lane wrote:
> On 2019-01-28 11:19:21 -0500, Tom Lane wrote:
> > Since the LLVM stuff went in, src/tools/pginclude/cpluspluscheck
> > fails on my main devel machine, because
> > 
> > In file included from /tmp/cpluspluscheck.jobsM6/test.cpp:3:
> > ./src/include/jit/llvmjit_emit.h:13:25: error: llvm-c/Core.h: No such file 
> > or directory
> > 
> > and then there's a bunch of ensuing spewage due to missing typedefs
> > etc.  llvmjit.h has the same problem plus this additional pointless
> > aggravation:
> > 
> > In file included from /tmp/cpluspluscheck.jobsM6/test.cpp:3:
> > ./src/include/jit/llvmjit.h:15:2: error: #error "llvmjit.h should only be 
> > included by code dealing with llvm"
> > 
> > I propose that we should fix this by making the whole bodies of those
> > two headers be #ifdef USE_LLVM.

Hm, I think it's sufficient that we error out if llvm was configured,
we've sufficient buildfarm machines running with it enabled.


> Actually, it also fails on another machine that does have LLVM installed:
> 
> In file included from /tmp/cpluspluscheck.LqnoZj/test.cpp:3:
> ./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
> l_ptr_const(void*, LLVMTypeRef)':
> ./src/include/jit/llvmjit_emit.h:22:32: error: 'TypeSizeT' was not declared 
> in this scope
>   LLVMValueRef c = LLVMConstInt(TypeSizeT, (uintptr_t) ptr, false);
> ^
> ./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
> l_sizet_const(size_t)':
> ./src/include/jit/llvmjit_emit.h:78:22: error: 'TypeSizeT' was not declared 
> in this scope
>   return LLVMConstInt(TypeSizeT, i, false);
>   ^
> ./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
> l_sbool_const(bool)':
> ./src/include/jit/llvmjit_emit.h:87:22: error: 'TypeStorageBool' was not 
> declared in this scope
>   return LLVMConstInt(TypeStorageBool, (int) i, false);
>   ^~~
> ./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
> l_pbool_const(bool)':
> ./src/include/jit/llvmjit_emit.h:96:22: error: 'TypeParamBool' was not 
> declared in this scope
>   return LLVMConstInt(TypeParamBool, (int) i, false);
>   ^
> ./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
> l_mcxt_switch(LLVMModuleRef, LLVMBuilderRef, LLVMValueRef)':
> ./src/include/jit/llvmjit_emit.h:205:34: error: 'StructMemoryContextData' was 
> not declared in this scope
>cur = LLVMAddGlobal(mod, l_ptr(StructMemoryContextData), cmc);
>   ^~~
> ./src/include/jit/llvmjit_emit.h:205:34: note: suggested alternative: 
> 'MemoryContextData'
>cur = LLVMAddGlobal(mod, l_ptr(StructMemoryContextData), cmc);
>   ^~~
>   MemoryContextData
> ./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
> l_funcnullp(LLVMBuilderRef, LLVMValueRef, size_t)':
> ./src/include/jit/llvmjit_emit.h:223:9: error: 
> 'FIELDNO_FUNCTIONCALLINFODATA_ARGS' was not declared in this scope
>  FIELDNO_FUNCTIONCALLINFODATA_ARGS,
>  ^
> ./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
> l_funcvaluep(LLVMBuilderRef, LLVMValueRef, size_t)':
> ./src/include/jit/llvmjit_emit.h:241:9: error: 
> 'FIELDNO_FUNCTIONCALLINFODATA_ARGS' was not declared in this scope
>  FIELDNO_FUNCTIONCALLINFODATA_ARGS,
>  ^
> 
> Evidently, llvmjit_emit.h doesn't meet the project standard that says
> it should be includable standalone (to ensure that header inclusion
> order isn't important in .c files).  It looks like it needs to #include
> llvmjit.h and fmgr.h to satisfy these references.  Is there a good
> reason why this wasn't done?

Not really a good one - it's not really meant as an API just a
collection of mini helpers for codegen, but there's no reason not to
make it self sufficient.

Will make them so.

Greetings,

Andres Freund

Re: Header checking failures on LLVM-less machines

2019-01-28 Thread Tom Lane

I wrote:
> Since the LLVM stuff went in, src/tools/pginclude/cpluspluscheck
> fails on my main devel machine, because

Actually, it also fails on another machine that does have LLVM installed:

In file included from /tmp/cpluspluscheck.LqnoZj/test.cpp:3:
./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
l_ptr_const(void*, LLVMTypeRef)':
./src/include/jit/llvmjit_emit.h:22:32: error: 'TypeSizeT' was not declared in 
this scope
  LLVMValueRef c = LLVMConstInt(TypeSizeT, (uintptr_t) ptr, false);
^
./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
l_sizet_const(size_t)':
./src/include/jit/llvmjit_emit.h:78:22: error: 'TypeSizeT' was not declared in 
this scope
  return LLVMConstInt(TypeSizeT, i, false);
  ^
./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
l_sbool_const(bool)':
./src/include/jit/llvmjit_emit.h:87:22: error: 'TypeStorageBool' was not 
declared in this scope
  return LLVMConstInt(TypeStorageBool, (int) i, false);
  ^~~
./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
l_pbool_const(bool)':
./src/include/jit/llvmjit_emit.h:96:22: error: 'TypeParamBool' was not declared 
in this scope
  return LLVMConstInt(TypeParamBool, (int) i, false);
  ^
./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
l_mcxt_switch(LLVMModuleRef, LLVMBuilderRef, LLVMValueRef)':
./src/include/jit/llvmjit_emit.h:205:34: error: 'StructMemoryContextData' was 
not declared in this scope
   cur = LLVMAddGlobal(mod, l_ptr(StructMemoryContextData), cmc);
  ^~~
./src/include/jit/llvmjit_emit.h:205:34: note: suggested alternative: 
'MemoryContextData'
   cur = LLVMAddGlobal(mod, l_ptr(StructMemoryContextData), cmc);
  ^~~
  MemoryContextData
./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
l_funcnullp(LLVMBuilderRef, LLVMValueRef, size_t)':
./src/include/jit/llvmjit_emit.h:223:9: error: 
'FIELDNO_FUNCTIONCALLINFODATA_ARGS' was not declared in this scope
 FIELDNO_FUNCTIONCALLINFODATA_ARGS,
 ^
./src/include/jit/llvmjit_emit.h: In function 'LLVMOpaqueValue* 
l_funcvaluep(LLVMBuilderRef, LLVMValueRef, size_t)':
./src/include/jit/llvmjit_emit.h:241:9: error: 
'FIELDNO_FUNCTIONCALLINFODATA_ARGS' was not declared in this scope
 FIELDNO_FUNCTIONCALLINFODATA_ARGS,
 ^

Evidently, llvmjit_emit.h doesn't meet the project standard that says
it should be includable standalone (to ensure that header inclusion
order isn't important in .c files).  It looks like it needs to #include
llvmjit.h and fmgr.h to satisfy these references.  Is there a good
reason why this wasn't done?

regards, tom lane

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 15:49:33 -0800, Andres Freund wrote:
> On 2019-01-28 18:08:59 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > I'm inclined to put back the
> > >LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> > >LockRelationForExtension(onerel, ExclusiveLock);
> > >UnlockRelationForExtension(onerel, ExclusiveLock);
> > >LockBufferForCleanup(buf);
> > >if (PageIsNew(page))
> > > dance regardless, just to get the buildfarm to green?
> > 
> > The buildfarm's got half a dozen reports now of a failure of this ilk,
> > so you'd better do something.
> 
> Yea, I was working on a patch. Was trying to come up with an explanation
> of how this can be realistically hit on the BF, but failed.  I've pushed
> something now, let's see whether that fixes it.

It has not. Given that I don't understand what's happening here I'm
going to revert both commits unless I figure it out in the next ~30min.

Greetings,

Andres Freund

RE: pg_upgrade: Pass -j down to vacuumdb

2019-01-28 Thread Jamison, Kirk

Hi Jesper,

Thanks for updating your patches quickly.

>On 1/28/19 3:50 PM, Peter Eisentraut wrote:
>>> Done, and v4 attached.
>> 
>> I would drop the changes in pgupgrade.sgml.  We don't need to explain 
>> what doesn't happen, when nobody before now ever thought that it would 
>> happen.
>> 
>> Also, we don't need the versioning stuff.  The new cluster in 
>> pg_upgrade is always of the current version, and we know what that one 
>> supports.
>> 

>Done.

I just checked the patch.
As per advice, you removed the versioning and specified --jobs. 
The patch still applies, builds and passed the tests successfully.

I'd argue though that it's helpful to find a short documentation to clarify
that it's not pass down by to vacuumdb default. I think the initial doc 
change from the previous patch would be good.
+ machine. Note, that this option isn't passed to the
+ vacuumdb application by default.
Just remove the comma symbol from "Note, that..."


Regards,
Kirk Jamison

Re: WIP: Avoid creation of the free space map for small tables

2019-01-28 Thread Amit Kapila

On Tue, Jan 29, 2019 at 12:37 AM John Naylor
 wrote:
>
> On Mon, Jan 28, 2019 at 12:10 PM Amit Kapila  wrote:
>
> > 2.
> > @@ -15,13 +15,9 @@
> >  SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS 
> > main_100;
> >  ERROR:  block number 100 is out of range for relation "test_rel_forks"
> >  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
> > - fsm_0
> > 
> > -  8192
> > -(1 row)
> > -
> > +ERROR:  could not open file "base/50769/50798_fsm": No such file or 
> > directory
> >  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
> > -ERROR:  block number 10 is out of range for relation "test_rel_forks"
> > +ERROR:  could not open file "base/50769/50798_fsm": No such file or 
> > directory
> >
> > This indicates that even though the Vacuum is executed, but the FSM
> > doesn't get created.  This could be due to different BLCKSZ, but the
> > failed machines don't seem to have a non-default value of it.  I am
> > not sure why this could happen, maybe we need to check once in the
> > failed regression database to see the size of relation?
>
> I'm also having a hard time imagining why this failed. Just in case,
> we could return ctid in a plpgsql loop and stop as soon as we see the
> 5th block. I've done that for some tests during development and is a
> safer method anyway.
>

I think we can devise some concrete way, but it is better first we try
to understand why it failed, otherwise there is always a chance that
we will repeat the mistake in some other case.  I think we have no
other choice, but to request the buildfarm owners to either give us
the access to see what happens or help us in investigating the
problem. The four buildfarms where it failed were lapwing, locust,
dromedary, prairiedog.   Among these, the owner of last two is Tom
Lane and others I don't recognize.  Tom, Andrew, can you help us in
getting the access of one of those four?  Yet another alternative is
the owner can apply the patch attached (this is same what got
committed) or reset to commit ac88d2962a and execute below statements
and share the results:

CREATE EXTENSION pageinspect;

CREATE TABLE test_rel_forks (a int);
INSERT INTO test_rel_forks SELECT i from generate_series(1,1000) i;
VACUUM test_rel_forks;
SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;

SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;

SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;

If the above statements give error: "ERROR:  could not open file ...", then run:
Analyze test_rel_forks;
Select oid, relname, relpages, reltuples from pg_class where relname
like 'test%';

The result of the above tests will tell us whether there are 5 pages
in the table or not.  If the table contains 5 pages and throws an
error, then there is some bug in our code, otherwise, there is
something specific to those systems where the above insert doesn't
result in 5 pages.

> > I think here you need to clear the map if it exists or clear it
> > unconditionally, the earlier one would be better.
>
> Ok, maybe all callers should call it unconditonally, but within the
> function, check "if (FSM_LOCAL_MAP_EXISTS)"?
>

Sounds sensible.  I think we should try to reproduce these failures,
for ex. for pgbench failure, we can try the same test with more
clients.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

v18-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 18:08:59 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I'm inclined to put back the
> >LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> >LockRelationForExtension(onerel, ExclusiveLock);
> >UnlockRelationForExtension(onerel, ExclusiveLock);
> >LockBufferForCleanup(buf);
> >if (PageIsNew(page))
> > dance regardless, just to get the buildfarm to green?
> 
> The buildfarm's got half a dozen reports now of a failure of this ilk,
> so you'd better do something.

Yea, I was working on a patch. Was trying to come up with an explanation
of how this can be realistically hit on the BF, but failed.  I've pushed
something now, let's see whether that fixes it.


> > But I do wonder if we should just make hio.c cope with this instead.
> 
> Probably should not try to go that way under time presssure.

Definitely.

Greetings,

Andres Freund

Re: jsonpath

2019-01-28 Thread Nikita Glukhov


On 28.01.2019 16:50, Tomas Vondra wrote:


On 1/28/19 1:22 AM, Alexander Korotkov wrote:

  * I decided to change behavior of jsonb_path_match() to throw as less
errors as possible.  The reason is that it's used to implement
potentially (patch is pending) indexable operator.  Index scan is not
always capable to throw as many errors and sequential scan.  So, it's
better to not introduce extra possible index scan and sequential scan
results divergence.

Hmmm, can you elaborate a bit more? Which errors were thrown before and
are not thrown with the current patch version?


In the previous version of the patch jsonb_path_match() threw error when
jsonpath did not return a singleton value, but in the last version in such cases
NULL is returned.  This problem arises because we cannot guarantee at compile
time that jsonpath expression used in jsonb_path_match() is a predicate.
Predicates by standard can return only True, False, and Unknown (errors occurred
during execution of their operands are transformed into Unknown values), so
predicates cannot throw errors, and there are no problems with errors.

GIN does not attempt to search non-predicate expressions, so there may be no
problem even we throw "not a singleton" error.


Here I want to remind that ability to use predicates in the root of jsonpath
expression is an our extension to standard that was created specially for the
operator @@.  By standard predicates are allowed only in filters.  Without this
extension we are still able to rewrite @@ using @?:
jsonb @@ 'predicate'is equivalent to
jsonb @? '$ ? (predicate)'
but such @? expression is a bit slower to execute and a bit verbose to write.

If we introduced special type 'jsonpath_predicate', then we could solve the
problem by checking the type of jsonpath expression at compile-time.


Another problem with error handling is that jsonb_path_query*() functions
always throw SQL/JSON errors and there is no easy and effective way to emulate
NULL ON ERROR behavior, which is used by default in SQL/JSON functions.  So I
think it's worth trying to add some kind of flag 'throwErrors' to
jsonb_path_query*() functions.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-28 Thread Tom Lane

Andres Freund  writes:
> I'm inclined to put back the
>LockBuffer(buf, BUFFER_LOCK_UNLOCK);
>LockRelationForExtension(onerel, ExclusiveLock);
>UnlockRelationForExtension(onerel, ExclusiveLock);
>LockBufferForCleanup(buf);
>if (PageIsNew(page))
> dance regardless, just to get the buildfarm to green?

The buildfarm's got half a dozen reports now of a failure of this ilk,
so you'd better do something.

> But I do wonder if we should just make hio.c cope with this instead.

Probably should not try to go that way under time presssure.

regards, tom lane

Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2019-01-28 Thread Tom Lane

Mark Dilger  writes:
>> The swtich in src/backend/parser/analyze.c circa line 2819 should
>> probably have an explicit case for RTE_RESULT rather than just a
>> comment and allowing the default to log "unrecognized RTE type",
>> since it's not really unrecognized, just unexpected.  But I'm not
>> too exercised about that, and won't argue if you want to leave it
>> as is.

Meh --- I doubt we need two different "can't happen" messages there.
The reason I treated this differently from some other places is that
transformLockingClause is only used during parsing, when there certainly
shouldn't be any RTE_RESULT RTEs present.  Some of the other functions
in that file are also called from outside the parser, so that it's
less certain they couldn't see a RTE_RESULT, so I added explicit
errors for them.

There's been some talk of having more uniform handling of switches
on enums, which might change the calculus here (i.e. we might not want
to have a default: case at all).  But I don't feel a need to add code
to transformLockingClause till we have that.

>> Similarly, in src/backend/commands/explain.c, should there be a
>> case for T_Result in ExplainTargetRel's switch circa line 2974?
>> I'm not sure given your design whether this could ever be relevant,
>> but I noticed that T_Result / RTE_RELATION isn't handled here.

We don't get to that function for a T_Result plan (cf. switch in
ExplainNode), so it'd just be dead code.

> Ok, version 6 of the patch applies cleanly, compiles, and passes
> all tests for me on my platform (mac os x).

Again, thanks for reviewing!

Pushed now.  I did yield to popular demand and change the temp table's
name in join.sql to be "onerow" instead of "dual" ;-)

regards, tom lane

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 14:10:55 -0800, Andres Freund wrote:
> So, I'd pushed the latest version. And longfin came back with an
> interesting error:
> 
> ERROR:  page 135 of relation "pg_class" should be empty but is not
> 
> The only way I can currently imagine this happening is that there's a
> concurrent vacuum that discovers the page is empty, enters it into the
> FSM (which now isn't happening under an extension lock anymore), and
> then a separate backend starts to actually use that buffer.  That seems
> tight but possible.  Seems we need to re-add the
> LockRelationForExtension/UnlockRelationForExtension() logic :(

Hm, but thinking about this, isn't this a risk independent of this
change? The FSM isn't WAL logged, and given it's tree-ish structure it's
possible to "rediscover" free space (e.g. FreeSpaceMapVacuum()). ISTM
that after a crash the FSM might point to free space that doesn't
actually exist, and is rediscovered after independent changes.  Not sure
if that's actually a realistic issue.

I'm inclined to put back the
   LockBuffer(buf, BUFFER_LOCK_UNLOCK);
   LockRelationForExtension(onerel, ExclusiveLock);
   UnlockRelationForExtension(onerel, ExclusiveLock);
   LockBufferForCleanup(buf);
   if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?

But I do wonder if we should just make hio.c cope with this instead.

Greetings,

Andres Freund

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-28 Thread Andres Freund

Hi,

So, I'd pushed the latest version. And longfin came back with an
interesting error:

ERROR:  page 135 of relation "pg_class" should be empty but is not

The only way I can currently imagine this happening is that there's a
concurrent vacuum that discovers the page is empty, enters it into the
FSM (which now isn't happening under an extension lock anymore), and
then a separate backend starts to actually use that buffer.  That seems
tight but possible.  Seems we need to re-add the
LockRelationForExtension/UnlockRelationForExtension() logic :(

Greetings,

Andres Freund

Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Tom Lane

Peter Eisentraut  writes:
> On 28/01/2019 21:35, Tom Lane wrote:
>> Conceivably we could make it work without the parens:
>> ...

> Or put it at the end?
> WITH ctename AS ( query ) MATERIALIZED

Yeah, I thought about that too, but it doesn't seem like an improvement.
If the query is very long (which isn't unlikely) I think people would
prefer to see the option(s) up front.

regards, tom lane

Re: Install JIT headers

2019-01-28 Thread Andres Freund

On 2019-01-22 11:42:35 -0800, Andres Freund wrote:
> Hi,
> 
> On 2019-01-22 11:08:40 -0800, Donald Dong wrote:
> > In the standard_planner, we set the proper JIT flags on the resulting
> > PlannedStmt node. We also offered a planer hook so extensions can
> > add customized planners. The JIT flags in jit/jit.h however, is
> > not present on the system. I think it's probably better to
> > install the jit headers?
> > 
> > 
> > diff --git a/src/include/Makefile b/src/include/Makefile
> > index 6bdfd7db91..041702809e 100644
> > --- a/src/include/Makefile
> > +++ b/src/include/Makefile
> > @@ -19,7 +19,7 @@ all: pg_config.h pg_config_ext.h pg_config_os.h
> >  # Subdirectories containing installable headers
> >  SUBDIRS = access bootstrap catalog commands common datatype \
> > executor fe_utils foreign \
> > -   lib libpq mb nodes optimizer parser partitioning postmaster \
> > +   jit lib libpq mb nodes optimizer parser partitioning postmaster \
> > regex replication rewrite \
> > statistics storage tcop snowball snowball/libstemmer tsearch \
> > tsearch/dicts utils port port/atomics port/win32 port/win32_msvc \
> 
> Seems like a good idea. I think we ought to backpatch that to 11?  Will
> do tomorrow if nobody protests.

Pushed to both. Thanks for the patch!

Thanks to Bruce for reminding me of the patch, too.

Greetings,

Andres Freund

Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Peter Eisentraut

On 28/01/2019 21:35, Tom Lane wrote:
> Conceivably we could make it work without the parens:
> 
> WITH ctename AS [ option = value [ ,  ] ] ( query  )
> 
> which for the immediate feature I'd be tempted to spell as
> 
> WITH ctename AS [ materialize = on/off ] ( query ... )
> 
> I think the only reason the syntax is MATERIALIZED with a D is that
> that's already a keyword; it reads a bit awkwardly IMO.  But if we
> were accepting a ColId there, there'd be room to adjust the spelling.

Or put it at the end?

WITH ctename AS ( query ) MATERIALIZED

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Proposed refactoring of planner header files

2019-01-28 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jan 28, 2019 at 3:17 PM Tom Lane  wrote:
>> I'm really unhappy that force_parallel_mode and
>> parallel_leader_participation are being treated as planner GUCs.

> The only use of parallel_leader_participation at plan time seems to be
> to twiddle the costing, and the use of it in the executor is to decide
> whether to have the leader participate.  So if the values differ,
> you'll get a plan running a behavior for which plan selection was not
> optimized.  I don't know whether it's useful to intentionally allow
> this so that you can see how the same plan behaves under the other
> setting, or whether it's just a wart we'd be better off without.  It
> might be confusing, though, if you change the setting and it doesn't
> force a replan.

Well, that puts it at the ill-considered end of the spectrum instead
of the outright-broken end, but I still say it's a bad idea.  Planner
GUCs ought to control the produced plan, not other behaviors.

regards, tom lane

Re: Speeding up text_position_next with multibyte encodings

2019-01-28 Thread Bruce Momjian

On Fri, Jan 25, 2019 at 04:33:54PM +0200, Heikki Linnakangas wrote:
> On 15/01/2019 02:52, John Naylor wrote:
> >The majority of cases are measurably faster, and the best case is at
> >least 20x faster. On the whole I'd say this patch is a performance win
> >even without further optimization. I'm marking it ready for committer.
> 
> I read through the patch one more time, tweaked the comments a little bit,
> and committed. Thanks for the review!
> 
> I did a little profiling of the worst case, where this is slower than the
> old approach. There's a lot of function call overhead coming from walking
> the string with pg_mblen(). That could be improved. If we inlined pg_mblen()
> into loop, it becomes much faster, and I think this code would be faster
> even in the worst case. (Except for the very worst cases, where hash table
> with the new code happens to have a collision at a different point than
> before, but that doesn't seem like a fair comparison.)
> 
> I think this is good enough as it is, but if I have the time, I'm going to
> try optimizing the pg_mblen() loop, as well as similar loops e.g. in
> pg_mbstrlen(). Or if someone else wants to give that a go, feel free.

It might be valuable to just inline the UTF8 case.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: backslash-dot quoting in COPY CSV

2019-01-28 Thread Bruce Momjian

On Mon, Jan 28, 2019 at 04:06:17PM +0100, Daniel Verite wrote:
>   Michael Paquier wrote:
> 
> > In src/bin/psql/copy.c, handleCopyIn():
> > 
> > /*
> > * This code erroneously assumes '\.' on a line alone
> > * inside a quoted CSV string terminates the \copy.
> > *
> > http://www.postgresql.org/message-id/e1tdnvq-0001ju...@wrigleys.postgresql.org
> > */
> > if (strcmp(buf, "\\.\n") == 0 ||
> >strcmp(buf, "\\.\r\n") == 0)
> > {
> >copydone = true;
> >break;
> > }
> 
> Indeed, it's exactly that problem.
> And there's the related problem that it derails the input stream
> in a way that lines of data become commands, but that one is
> not specific to that particular error.
> 
> For the backslash-dot in a quoted string, the root cause is
> that psql is not aware that the contents are CSV so it can't
> parse them properly.
> I can think of several ways of working around that, more or less
> inelegant:
> 
> - the end of data could be expressed as a length (in number of lines
> for instance) instead of an in-data marker.
> 
> - the end of data could be configurable, as in the MIME structure of
> multipart mail messages, where a part is ended by a "boundary",
> line, generally a long randomly generated string. This boundary
> would have to be known to psql through setting a dedicated
> variable or command.
> 
> - COPY as the SQL command could have the boundary option
> for data fed through its STDIN. This could neutralize the
> special role of backslash-dot in general, not just in quoted fields,
> since the necessity to quote backslash-dot is a wart anyway.

Well, these all kind of require a change to the COPY format, which
hasn't changed in many years.

> - psql could be told somehow that the next piece of inline data is in
> the CSV format, and then pass it through a CSV parser.

That might be the cleanest solution, but how would we actually input
multi-line data in CSV mode with \. alone on a line?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: backslash-dot quoting in COPY CSV

2019-01-28 Thread Bruce Momjian

On Fri, Jan 25, 2019 at 01:01:22PM +0100, Daniel Verite wrote:
>   Bruce Momjian wrote:
> 
> > but I am able to see the failure using STDIN:
> > 
> > COPY test FROM STDIN CSV;
> > Enter data to be copied followed by a newline.
> > End with a backslash and a period on a line by itself, or an EOF
> > signal.
> > "foo
> > \.
> > ERROR:  unterminated CSV quoted field
> > CONTEXT:  COPY test, line 1: ""foo
> > 
> > This seems like a bug to me.  Looking at the code, psql issues the
> > prompts for STDIN, but when it sees \. alone on a line, it has no idea
> > you are in a quoted CSV string, so it thinks the copy is done and sends
> > the result to the server.  I can't see an easy way to fix this.  I guess
> > we could document it.
> 
> Thanks for looking into this.  
> 
> \copy from file with csv is also affected since it uses COPY FROM
> STDIN behind the scene. The case of embedded data looks more worrying
> because psql will execute the data following \. as if they were
> SQL statements.
> 
> ISTM that only ON_ERROR_STOP=on prevents the risk of SQL injection
> in that scenario, but it's off by default.

You are correct that someone having data that is SQL commands would be
able to perhaps execute those commands on restore.  pg_dump doesn't use
CSV, and this only affects STDIN, not files or PROGRAM input.  I think
the question is how many people are using CSV/STDIN for insecure data
loads?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2019-01-28 Thread Peter Geoghegan

On Mon, Jan 28, 2019 at 7:32 AM Heikki Linnakangas  wrote:
> I spent some time first trying to understand the current algorithm, and
> then rewriting it in a way that I find easier to understand. I came up
> with the attached. I think it optimizes for the same goals as your
> patch, but the approach  is quite different. At a very high level, I
> believe the goals can be described as:
>
> 1. Find out how much suffix truncation is possible, i.e. how many key
> columns can be truncated away, in the best case, among all possible ways
> to split the page.
>
> 2. Among all the splits that achieve that optimum suffix truncation,
> find the one with smallest "delta".

Thanks for going to the trouble of implementing what you have in mind!

I agree that the code that I wrote within nbtsplitloc.c is very
subtle, and I do think that I have further work to do to make its
design clearer. I think that this high level description of the goals
of the algorithm is inaccurate in subtle but important ways, though.
Hopefully there will be a way of making it more understandable that
preserves certain important characteristics. If you had my test
cases/data that would probably help you a lot (more on that later).

The algorithm I came up with almost always does these two things in
the opposite order, with each considered in clearly separate phases.
There are good reasons for this. We start with the same criteria as
the old algorithm. We assemble a small array of candidate split
points, rather than one split point, but otherwise it's almost exactly
the same (the array is sorted by delta). Then, at the very end, we
iterate through the small array to find the best choice for suffix
truncation. IOW, we only consider suffix truncation as a *secondary*
goal. The delta is still by far the most important thing 99%+ of the
time. I assume it's fairly rare to not have two distinct tuples within
9 or so tuples of the delta-wise optimal split position -- 99% is
probably a low estimate, at least in OLTP app, or within unique
indexes. I see that you do something with a "good enough" delta that
seems like it also makes delta the most important thing, but that
doesn't appear to be, uh, good enough. ;-)

Now, it's true that my approach does occasionally work in a way close
to what you describe above -- it does this when we give up on default
mode and check "how much suffix truncation is possible?" exhaustively,
for every possible candidate split point. "Many duplicates" mode kicks
in when we need to be aggressive about suffix truncation. Even then,
the exact goals are different to what you have in mind in subtle but
important ways. While "truncating away the heap TID" isn't really a
special case in other places, it is a special case for my version of
nbtsplitloc.c, which more or less avoids it at all costs. Truncating
away heap TID is more important than truncating away any other
attribute by a *huge* margin. Many duplicates mode *only* specifically
cares about truncating the final TID attribute. That is the only thing
that is ever treated as more important than delta, though even there
we don't forget about delta entirely. That is, we assume that the
"perfect penalty" is nkeyatts when in many duplicates mode, because we
don't care about suffix truncation beyond heap TID truncation by then.
So, if we find 5 split points out of 250 in the final array that avoid
appending heap TID, we use the earliest/lowest delta out of those 5.
We're not going to try to maximize the number of *additional*
attributes that get truncated, because that can make the leaf pages
unbalanced in an *unbounded* way. None of these 5 split points are
"good enough", but the distinction between their deltas still matters
a lot. We strongly prefer a split with a *mediocre* delta to a split
with a *terrible* delta -- a bigger high key is the least of our
worries here. (I made similar mistakes myself months ago, BTW.)

Your version of the algorithm makes a test case of mine (UK land
registry test case [1]) go from having an index that's 1101 MB with my
version of the algorithm/patch and 1329 MB on the master branch to an
index that's 3030 MB in size. I think that this happens because it
effectively fails to give any consideration to delta at all at certain
points. On leaf pages with lots of unique keys, your algorithm does
about as well as mine because all possible split points look equally
good suffix-truncation-wise, plus you have the "good enough" test, so
delta isn't ignored. I think that your algorithm also works well when
there are many duplicates but only one non-TID index column, since the
heap TID truncation versus other truncation issue does not arise. The
test case I used is an index on "(county, city, locality)", though --
low cardinality, but more than a single column. That's a *combination*
of two separate considerations, that seem to get conflated. I don't
think that you can avoid doing "a second pass" in some sense, because
these really are separate considerations.

Re: backslash-dot quoting in COPY CSV

2019-01-28 Thread Bruce Momjian

On Sun, Jan 27, 2019 at 10:10:36PM +0900, Michael Paquier wrote:
> On Thu, Jan 24, 2019 at 10:09:30PM -0500, Bruce Momjian wrote:
> > This seems like a bug to me.  Looking at the code, psql issues the
> > prompts for STDIN, but when it sees \. alone on a line, it has no idea
> > you are in a quoted CSV string, so it thinks the copy is done and sends
> > the result to the server.  I can't see an easy way to fix this.  I guess
> > we could document it.
> 
> In src/bin/psql/copy.c, handleCopyIn():
> 
> /*
>  * This code erroneously assumes '\.' on a line alone
>  * inside a quoted CSV string terminates the \copy.
>  * 
> http://www.postgresql.org/message-id/e1tdnvq-0001ju...@wrigleys.postgresql.org
>  */
> if (strcmp(buf, "\\.\n") == 0 ||
> strcmp(buf, "\\.\r\n") == 0)
> {
> copydone = true;
> break;
> }
> 
> This story pops up from time to time..

The killer is I committed this C comment six years ago, and didn't
remember it.  :-O

commit 361b94c4b98b85b19b850cff37be76d1f6d4f8f7
Author: Bruce Momjian 
Date:   Thu Jul 4 13:09:52 2013 -0400

Add C comment about \copy bug in CSV mode
Comment: This code erroneously assumes '\.' on a line alone inside a
quoted CSV string terminates the \copy.

http://www.postgresql.org/message-id/e1tdnvq-0001ju...@wrigleys.postgresql.org

Glad I mentioned the URL, at least.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Proposed refactoring of planner header files

2019-01-28 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-28 13:02:11 -0800, Andres Freund wrote:
>> It's not required by C99, it however is required by C11. But a lot of
>> compilers have allowed it as an extension for a long time (like before
>> C99), unless suppressed by some option.

> Hm, it's only in gcc 4.6, so that's probably too recent.

Yeah, I tried it with RHEL6's gcc 4.4.7, and it doesn't work
(and AFAICS there is no option that would make it work).  A lot
of the buildfarm is running compilers older than that.

I fear we're still 10 years away from being able to demand C11
support ...

regards, tom lane

Re: Proposed refactoring of planner header files

2019-01-28 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-28 15:50:22 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>>> Ugh, isn't it nicer to just use the underlying struct type instead of
>>> that?

>> No, because that'd mean that anyplace relying on optimizer.h would also
>> have to write "struct PlannerInfo" rather than "PlannerInfo"; the
>> effects wouldn't be limited to the header itself.

> Why? It can be called with the typedef'd version, or not.

Because I don't want users of the header to have to declare, say, local
variables as "struct PlannerInfo *root" instead of "PlannerInfo *root".
The former is not project style and I will not accept forcing that in
order to save a couple of #ifdefs in headers.  I don't actually understand
what you find so ugly about it.

One idea that would save a lot of random "struct foo" hacks in headers
is to allow nodes.h to include "typedef struct MyNode MyNode" for any
recognized node type.  We could either do that just for nodes we've
found it useful for, or pre-emptively for the whole list.

regards, tom lane

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-15 17:35:21 -0500, Tom Lane wrote:
> Andres Freund  writes:
> >> Looking at the surrounding code made me wonder about the wisdom of
> >> entering empty pages as all-visible and all-frozen into the VM. That'll
> >> mean we'll never re-discover them on a primary, after promotion. There's
> >> no mechanism to add such pages to the FSM on a standby (in contrast to
> >> e.g. pages where tuples are modified), as vacuum will never visit that
> >> page again.  Now obviously it has the advantage of avoiding
> >> re-processing the page in the next vacuum, but is that really an
> >> important goal? If there's frequent vacuums, there got to be a fair
> >> amount of modifications, which ought to lead to re-use of such pages at
> >> a not too far away point?
> 
> > Any comments on the approach in this patch?
> 
> I agree with the concept of postponing page init till we're actually
> going to do something with the page.  However, the patch seems to need
> some polish:

Yea, as I'd written in an earlier message, it was really meant as a
prototype to see whether there's buyin to the change.


> * There's a comment in RelationAddExtraBlocks, just above where you
> changed, that
> 
>  * Extend by one page.  This should generally match the main-line
>  * extension code in RelationGetBufferForTuple, except that we hold
>  * the relation extension lock throughout.
> 
> This seems to be falsified by this patch, in that one of the two paths
> does PageInit and the other doesn't.
> 
> * s/unitialized pages/uninitialized pages/
> 
> * This bit in vacuumlazy seems unnecessarily confusing:
> 
> +Sizefreespace = 0;
> ...
> +if (GetRecordedFreeSpace(onerel, blkno) == 0)
> +freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
> +
> +if (freespace > 0)
> +{
> +RecordPageWithFreeSpace(onerel, blkno, freespace);
> 
> I'd write that as just
> 
> +if (GetRecordedFreeSpace(onerel, blkno) == 0)
> +{
> +Sizefreespace = BufferGetPageSize(buf) - 
> SizeOfPageHeaderData;
> +RecordPageWithFreeSpace(onerel, blkno, freespace);

Fixed, thanks for looking!


> I tend to agree that the DEBUG message isn't very necessary, or at least
> could be lower than DEBUG1.

After a bit of back and forth waffling, I've removed it now.

Besides a fair bit of comment changes the latest version has just one
functional change: lazy_check_needs_freeze() doesn't indicate requiring
freezing for new pages anymore, if we can't get a cleanup lock on those,
it's about to be written to, and we'd not do more than enter it into the
FSM.

Greetings,

Andres Freund

Re: Built-in connection pooler

2019-01-28 Thread Dimitri Fontaine

Hi,

Bruce Momjian  writes:
> It is nice it is a smaller patch.  Please remind me of the performance
> advantages of this patch.

The patch as it stands is mostly helpful in those situations:

  - application server(s) start e.g. 2000 connections at start-up and
then use them depending on user traffic

It's then easy to see that if we would only fork as many backends as
we need, while having accepted the 2000 connections without doing
anything about them, we would be in a much better position than when
we fork 2000 unused backends.

  - application is partially compatible with pgbouncer transaction
pooling mode

Then in that case, you would need to run with pgbouncer in session
mode. This happens when the application code is using session level
SQL commands/objects, such as prepared statements, temporary tables,
or session-level GUCs settings.

With the attached patch, if the application sessions profiles are
mixed, then you dynamically get the benefits of transaction pooling
mode for those sessions which are not “tainting” the backend, and
session pooling mode for the others.

It means that it's then possible to find the most often used session
and fix that one for immediate benefits, leaving the rest of the
code alone. If it turns out that 80% of your application sessions
are the same code-path and you can make this one “transaction
pooling” compatible, then you most probably are fixing (up to) 80%
of your connection-related problems in production.

  - applications that use a very high number of concurrent sessions

In that case, you can either set your connection pooling the same as
max_connection and see no benefits (and hopefully no regressions
either), or set a lower number of backends serving a very high
number of connections, and have sessions waiting their turn at the
“proxy” stage.

This is a kind of naive Admission Control implementation where it's
better to have active clients in the system wait in line consuming
as few resources as possible. Here, in the proxy. It could be done
with pgbouncer already, this patch gives a stop-gap in PostgreSQL
itself for those use-cases.

It would be mostly useful to do that when you have queries that are
benefiting of parallel workers. In that case, controling the number
of active backend forked at any time to serve user queries allows to
have better use of the parallel workers available.

In other cases, it's important to measure and accept the possible
performance cost of running a proxy server between the client connection
and the PostgreSQL backend process. I believe the numbers shown in the
previous email by Konstantin are about showing the kind of impact you
can see when using the patch in a use-case where it's not meant to be
helping much, if at all.

Regards,
-- 
dim

Re: INSTALL file

2019-01-28 Thread Peter Eisentraut

On 03/11/2018 00:14, Andreas 'ads' Scherbaum wrote:
> GNU make, version 3.8 or newer
> ISO/ANSI C compilar (at least C99-compliant)
> Flex 2.5.31 or later, and Bison 1.875 or later (for building from git)
> Perl 5.8.3 (for building from git)

I'm not in favor of listing all these versions here.  It's one more
thing to keep updated.  The version requirements are not outrageous, so
we can assume that someone with a reasonably up-to-date development
machine has appropriate versions.

I'm also not convinced that this list is actually complete.  From an
"empty" start you'd typically need at least zlib and readline on top of
those.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Proposed refactoring of planner header files

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 13:02:11 -0800, Andres Freund wrote:
> It's not required by C99, it however is required by C11. But a lot of
> compilers have allowed it as an extension for a long time (like before
> C99), unless suppressed by some option. I think that's partially because
> C++ has allowed it for longer.  I don't know how many of the BF
> compilers could be made to accept that - I'd be very suprised if yours 
> couldn't.

Hm, it's only in gcc 4.6, so that's probably too recent.

Greetings,

Andres Freund

Re: Built-in connection pooler

2019-01-28 Thread Bruce Momjian

On Thu, Jan 24, 2019 at 08:14:41PM +0300, Konstantin Knizhnik wrote:
> The main differences with pgbouncer&K are that:
> 
> 1. It is embedded and requires no extra steps for installation and
> configurations.
> 2. It is not single threaded (no bottleneck)
> 3. It supports all clients (if client needs session semantic, then it will be
> implicitly given dedicated backend)
> 
> 
> Some performance results (pgbench -S -n):
> 
> ┌┬┬─┬─┬─┐
> │ #Connections   │ Proxy  │ Proxy/SSL   │ Direct  │ Direct/SSL   │
> ├┼┼─┼─┼──┤
> │ 1  │  13752 │   12396 │   17443 │15762 │
> ├┼┼─┼─┼──┤
> │ 10 │  53415 │   59615 │   68334 │85885 │
> ├┼┼─┼─┼──┤
> │ 1000   │  60152 │   20445 │   60003 │24047 │
> └┴┴─┴─┴──┘

It is nice it is a smaller patch.  Please remind me of the performance
advantages of this patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Proposed refactoring of planner header files

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 15:50:22 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2019-01-28 15:17:19 -0500, Tom Lane wrote:
> >> Since I was intentionally trying to limit what optimizer.h pulls in,
> >> and in particular not let it include relation.h, I needed an opaque
> >> typedef for PlannerInfo.  On the other hand relation.h also needs to
> >> typedef that.  I fixed that with a method that we've not used in our
> >> code AFAIK, but is really common in system headers: there's a #define
> >> symbol to remember whether we've created the typedef, and including
> >> both headers in either order will work fine.
> 
> > Ugh, isn't it nicer to just use the underlying struct type instead of
> > that?
> 
> No, because that'd mean that anyplace relying on optimizer.h would also
> have to write "struct PlannerInfo" rather than "PlannerInfo"; the
> effects wouldn't be limited to the header itself.

Why? It can be called with the typedef'd version, or not.  Unless it
calling code has on-stack pointers to it, it ought to never deal with
PlannerInfo vs struct PlannerInfo.  In a lot of cases the code calling
the function will either get the PlannerInfo from somewhere (in which
case that'll either have the typedef'd version or not).


> > Or alternatively we could expand our compiler demands to require
> > that redundant typedefs are allowed - I'm not sure there's any animal
> > left that doesn't support that (rather than triggering an error it via
> > an intentionally set flag).
> 
> I'd be happy with that if it actually works, but I strongly suspect
> that it does not.  Or can you cite chapter and verse where C99
> requires it to work?  My own compiler complains about "redefinition
> of typedef 'foo'".

It's not required by C99, it however is required by C11. But a lot of
compilers have allowed it as an extension for a long time (like before
C99), unless suppressed by some option. I think that's partially because
C++ has allowed it for longer.  I don't know how many of the BF
compilers could be made to accept that - I'd be very suprised if yours couldn't.


> >> I would have exposed estimate_rel_size, which is needed by
> >> access/hash/hash.c, except that it requires Relation and
> >> BlockNumber typedefs.  The incremental value from keeping
> >> hash.c from using plancat.h probably isn't worth widening
> >> optimizer.h's #include footprint further.  Also, I wonder
> >> whether that whole area needs a rethink for pluggable storage.
> 
> > What kind of rejiggering were you thinking of re pluggable storage?
> 
> I wasn't; I was just thinking that I didn't want to advertise it
> as a stable globally-accessible API if it's likely to get whacked
> around soon.

Ah. So far the signature didn't need to change, and I'm not aware of
pending issues requiring it.

Greetings,

Andres Freund

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-28 Thread Jesper Pedersen


On 1/28/19 3:50 PM, Peter Eisentraut wrote:

Done, and v4 attached.


I would drop the changes in pgupgrade.sgml.  We don't need to explain
what doesn't happen, when nobody before now ever thought that it would
happen.

Also, we don't need the versioning stuff.  The new cluster in pg_upgrade
is always of the current version, and we know what that one supports.



Done.

Best regards,
 Jesper
>From 5b51f927dff808b4a8516b424c2ef790dbe96da6 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb by default.

Author: Jesper Pedersen 
---
 src/bin/pg_upgrade/check.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..847e604088 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1)
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--jobs %d --all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data, user_opts.jobs,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s$VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s%%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-28 Thread Peter Eisentraut

On 28/01/2019 17:47, Jesper Pedersen wrote:
> On 1/28/19 11:30 AM, Fabrízio de Royes Mello wrote:
>> IMHO you should use long option name '--jobs' like others.
>>
> 
> Thanks for your feedback !
> 
> Done, and v4 attached.

I would drop the changes in pgupgrade.sgml.  We don't need to explain
what doesn't happen, when nobody before now ever thought that it would
happen.

Also, we don't need the versioning stuff.  The new cluster in pg_upgrade
is always of the current version, and we know what that one supports.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Proposed refactoring of planner header files

2019-01-28 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-28 15:17:19 -0500, Tom Lane wrote:
>> Since I was intentionally trying to limit what optimizer.h pulls in,
>> and in particular not let it include relation.h, I needed an opaque
>> typedef for PlannerInfo.  On the other hand relation.h also needs to
>> typedef that.  I fixed that with a method that we've not used in our
>> code AFAIK, but is really common in system headers: there's a #define
>> symbol to remember whether we've created the typedef, and including
>> both headers in either order will work fine.

> Ugh, isn't it nicer to just use the underlying struct type instead of
> that?

No, because that'd mean that anyplace relying on optimizer.h would also
have to write "struct PlannerInfo" rather than "PlannerInfo"; the
effects wouldn't be limited to the header itself.

> Or alternatively we could expand our compiler demands to require
> that redundant typedefs are allowed - I'm not sure there's any animal
> left that doesn't support that (rather than triggering an error it via
> an intentionally set flag).

I'd be happy with that if it actually works, but I strongly suspect
that it does not.  Or can you cite chapter and verse where C99
requires it to work?  My own compiler complains about "redefinition
of typedef 'foo'".

>> I'm really unhappy that force_parallel_mode and
>> parallel_leader_participation are being treated as planner GUCs.
>> They are not that, IMO, because they also affect the behavior of
>> the executor, cf HandleParallelMessage, ExecGather, ExecGatherMerge.
>> This is somewhere between ill-considered and outright broken: what
>> happens if the values change between planning and execution?

> Stylistically I agree with that (and it's what I ended up doing for JIT
> compilation as well), but do you see concrete harm here?

Well, I don't know: I haven't tried to trace the actual effects if
the values change.  Seems like they could be pretty bad, in the worst
case if this wasn't thought through, which it looks like it wasn't.

In any case, if these settings are somehow global rather than being
genuinely planner GUCs, then the planner shouldn't be responsible
for defining them ... maybe they belong in miscadmin.h?

>> I would have exposed estimate_rel_size, which is needed by
>> access/hash/hash.c, except that it requires Relation and
>> BlockNumber typedefs.  The incremental value from keeping
>> hash.c from using plancat.h probably isn't worth widening
>> optimizer.h's #include footprint further.  Also, I wonder
>> whether that whole area needs a rethink for pluggable storage.

> What kind of rejiggering were you thinking of re pluggable storage?

I wasn't; I was just thinking that I didn't want to advertise it
as a stable globally-accessible API if it's likely to get whacked
around soon.

regards, tom lane

Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Tom Lane

Robert Haas  writes:
> However, generally we have not had great luck with just sticking
> keywords in there (cf. VACUUM, ANALYZE, EXPLAIN, COPY) which is why I
> suggested using a flexible syntax with parenthesized options.

Fair, except that as you then proceed to point out, that does not work
either before or after the AS.

Conceivably we could make it work without the parens:

WITH ctename AS [ option = value [ ,  ] ] ( query  )

which for the immediate feature I'd be tempted to spell as

WITH ctename AS [ materialize = on/off ] ( query ... )

I think the only reason the syntax is MATERIALIZED with a D is that
that's already a keyword; it reads a bit awkwardly IMO.  But if we
were accepting a ColId there, there'd be room to adjust the spelling.

That said, this is still clunkier than the existing proposal, and
I'm not really convinced that we need to leave room for more options.

regards, tom lane

Re: Proposed refactoring of planner header files

2019-01-28 Thread Robert Haas

On Mon, Jan 28, 2019 at 3:17 PM Tom Lane  wrote:
> I'm really unhappy that force_parallel_mode and
> parallel_leader_participation are being treated as planner GUCs.
> They are not that, IMO, because they also affect the behavior of
> the executor, cf HandleParallelMessage, ExecGather, ExecGatherMerge.
> This is somewhere between ill-considered and outright broken: what
> happens if the values change between planning and execution?

The only use of parallel_leader_participation at plan time seems to be
to twiddle the costing, and the use of it in the executor is to decide
whether to have the leader participate.  So if the values differ,
you'll get a plan running a behavior for which plan selection was not
optimized.  I don't know whether it's useful to intentionally allow
this so that you can see how the same plan behaves under the other
setting, or whether it's just a wart we'd be better off without.  It
might be confusing, though, if you change the setting and it doesn't
force a replan.

Similarly, the use of force_parallel_mode in HandleParallelMessage()
really just affects what CONTEXT lines you get.  That may be
ill-considered but it doesn't seem outright broken.  Here again, I'm
not really sure that forcibly preserving the plan-time value at
execution time would really result in happier users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Proposed refactoring of planner header files

2019-01-28 Thread Andres Freund

Hi,

On 2019-01-28 15:17:19 -0500, Tom Lane wrote:
> In <6044.1548524...@sss.pgh.pa.us> I worried about how much planner
> stuff that patch might end up dragging into files that contain
> planner support functions, and suggested that we could refactor the
> planner's header files to reduce the inclusion footprint.  Attached
> are some proposed patches to improve the situation.  The work isn't
> fully done yet, but I was hoping to get buy-in on this approach
> before going further.

> The basic idea here is to create a new header file, which I chose
> to call optimizer/optimizer.h, and put into it just the stuff that
> "outside" callers of the planner might need.  For this purpose
> "outside" can be approximated as "doesn't really need to know what
> is in relation.h", ie Paths and related data structures.  I expect
> that planner support functions will mostly be working with parsetree
> data structures for their functions, so they should fit that
> restriction.  In some cases they need to be able to pass a PlannerInfo
> pointer through to some planner function they want to call, but they
> can treat the pointer as an opaque type.  This worked out pretty well,
> as I was able to eliminate uses of all other optimizer/ headers (and,
> therefore, relation.h) from all but four or five files outside
> backend/optimizer/.  The holdouts are mostly places that are pretty
> much in bed with the planner anyway, such as statistics/dependencies.c.

Without having studied the patch in any detail, that seems a worthwhile
goal to me.

> Since I was intentionally trying to limit what optimizer.h pulls in,
> and in particular not let it include relation.h, I needed an opaque
> typedef for PlannerInfo.  On the other hand relation.h also needs to
> typedef that.  I fixed that with a method that we've not used in our
> code AFAIK, but is really common in system headers: there's a #define
> symbol to remember whether we've created the typedef, and including
> both headers in either order will work fine.

Ugh, isn't it nicer to just use the underlying struct type instead of
that?  Or alternatively we could expand our compiler demands to require
that redundant typedefs are allowed - I'm not sure there's any animal
left that doesn't support that (rather than triggering an error it via
an intentionally set flag).

> I'm really unhappy that force_parallel_mode and
> parallel_leader_participation are being treated as planner GUCs.
> They are not that, IMO, because they also affect the behavior of
> the executor, cf HandleParallelMessage, ExecGather, ExecGatherMerge.
> This is somewhere between ill-considered and outright broken: what
> happens if the values change between planning and execution?  I think
> we probably need to fix things so that those variables do not need to
> be looked at by the executor, carrying them forward in the plan
> tree if necessary.  Then they'd not need to be exposed by
> optimizer.h, and indeed I think the mentioned modules wouldn't
> need any optimizer inclusions anymore.

Stylistically I agree with that (and it's what I ended up doing for JIT
compilation as well), but do you see concrete harm here? I don't think
it's super likely that anybody changes these on the fly and expects
immediate effects?

> I would have exposed estimate_rel_size, which is needed by
> access/hash/hash.c, except that it requires Relation and
> BlockNumber typedefs.  The incremental value from keeping
> hash.c from using plancat.h probably isn't worth widening
> optimizer.h's #include footprint further.  Also, I wonder
> whether that whole area needs a rethink for pluggable storage.

The pluggable storage patchset currently makes estimate_rel_size() a
callback for table-like objects. While using BlockNumber isn't all that
pretty (it's one of a few tableam functions dealing with blocks, the
others being samplescan and bitmapscan). I've been wondering whether we
should change the API to return the size in a non-block oriented way,
but given the current costing I wasn't sure that's worth much.

There's an XXX in the changed code wondering whether we should make
estimate_rel_size() also call a callback for indexes, that'd e.g. allow
better per-index logic (c.f. the function's comment about gist).

What kind of rejiggering were you thinking of re pluggable storage?

Greetings,

Andres Freund

Re: Alternative to \copy in psql modelled after \g

2019-01-28 Thread Corey Huinker

> Otherwise "\g -" looks good as a portable solution.

+1

Proposed refactoring of planner header files

2019-01-28 Thread Tom Lane

In <6044.1548524...@sss.pgh.pa.us> I worried about how much planner
stuff that patch might end up dragging into files that contain
planner support functions, and suggested that we could refactor the
planner's header files to reduce the inclusion footprint.  Attached
are some proposed patches to improve the situation.  The work isn't
fully done yet, but I was hoping to get buy-in on this approach
before going further.

The basic idea here is to create a new header file, which I chose
to call optimizer/optimizer.h, and put into it just the stuff that
"outside" callers of the planner might need.  For this purpose
"outside" can be approximated as "doesn't really need to know what
is in relation.h", ie Paths and related data structures.  I expect
that planner support functions will mostly be working with parsetree
data structures for their functions, so they should fit that
restriction.  In some cases they need to be able to pass a PlannerInfo
pointer through to some planner function they want to call, but they
can treat the pointer as an opaque type.  This worked out pretty well,
as I was able to eliminate uses of all other optimizer/ headers (and,
therefore, relation.h) from all but four or five files outside
backend/optimizer/.  The holdouts are mostly places that are pretty
much in bed with the planner anyway, such as statistics/dependencies.c.

I did not attempt to narrow the API used by FDWs, so file_fdw and
postgres_fdw are two of the main places that still need other
optimizer/ headers.  It might be useful to do a similar exercise
focusing on the API seen by FDWs, but that's for another time.

Also, I didn't work on tightening selfuncs.c's dependencies.
While I don't have a big problem with considering selfuncs.c to be
in bed with the planner, that's risky in that whatever dependencies
selfuncs.c has may well apply to extensions' selectivity estimators too.
What I'm thinking about doing there is trying to split selfuncs.c into
two parts, one being infrastructure that can be tightly tied to the
core planner (and, likely, get moved under backend/optimizer/) and the
other being estimators that use a limited API and can serve as models
for extension code.  But I haven't tried to do that yet, and would like
to get the attached committed first.

There are three patches attached:

0001 takes some very trivial support functions out of clauses.c and
puts them into the generic node support headers (nodeFuncs.h and
makefuncs.h) where they arguably should have been all along.  I also
took the opportunity to rename and_clause() and friends into
is_andclause() etc, to make it clearer that they are node-type-testing
functions not node-construction functions, and improved the style a
bit by using "static inline" where practical.

0002 adjusts the API of flatten_join_alias_vars() and some subsidiary
functions so that they take a Query not a PlannerInfo to define the
context they're using for Var transformation.  This makes it greatly
less ugly for parse_agg.c to call that function.  Without this change
it's impossible for parse_agg.c to be decoupled from relation.h.
It likely also saves some tiny number of cycles, by removing one level
of pointer indirection within that processing.

0003 then creates optimizer.h, moves relevant declarations there, and
adjusts #includes as needed.

Since I was intentionally trying to limit what optimizer.h pulls in,
and in particular not let it include relation.h, I needed an opaque
typedef for PlannerInfo.  On the other hand relation.h also needs to
typedef that.  I fixed that with a method that we've not used in our
code AFAIK, but is really common in system headers: there's a #define
symbol to remember whether we've created the typedef, and including
both headers in either order will work fine.

optimizer.h exposes a few of the planner's GUCs, but just the basic
cost parameters, which are likely to be useful to planner support
functions.  Another choice is to expose all of them, but I'm not
sure that's a great idea --- see gripe below for an example of why
that can encourage broken coding.

I intentionally limited 0003 to just do header refactoring, not code
motion, so there are some other follow-on tasks I'm thinking about.
Notably:

I'm really unhappy that force_parallel_mode and
parallel_leader_participation are being treated as planner GUCs.
They are not that, IMO, because they also affect the behavior of
the executor, cf HandleParallelMessage, ExecGather, ExecGatherMerge.
This is somewhere between ill-considered and outright broken: what
happens if the values change between planning and execution?  I think
we probably need to fix things so that those variables do not need to
be looked at by the executor, carrying them forward in the plan
tree if necessary.  Then they'd not need to be exposed by
optimizer.h, and indeed I think the mentioned modules wouldn't
need any optimizer inclusions anymore.

Most everything that's being exposed from tlist.c and var.c could be
argued to be gen

Re: proposal: new polymorphic types - commontype and commontypearray

2019-01-28 Thread Robert Haas

On Fri, Jan 25, 2019 at 7:21 PM Tom Lane  wrote:
> Anyway I think the names need to be any-something.

To me, that seems unnecessarily rigid.  Not a bad idea if we can come
up with something that is otherwise acceptable.  But all of your
suggestions sound worse than Pavel's proposal, so...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP: Avoid creation of the free space map for small tables

2019-01-28 Thread John Naylor

On Mon, Jan 28, 2019 at 12:10 PM Amit Kapila  wrote:
>
> On Mon, Jan 28, 2019 at 10:03 AM John Naylor
>  wrote:
> >
> 1.
> @@ -26,7 +26,7 @@
>  pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
>   heap_size | fsm_size
>  ---+--
> - 24576 |0
> + 32768 |0
>  (1 row)
>
>  -- Extend table with enough blocks to exceed the FSM threshold
> @@ -56,7 +56,7 @@
>  SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
>   fsm_size
>  --
> -16384
> +24576
>  (1 row)
>
>
> As discussed on another thread, this seems to be due to the reason
> that a parallel auto-analyze doesn't allow vacuum to remove dead-row
> versions.  To fix this, I think we should avoid having a dependency on
> vacuum to remove dead rows.

Ok, to make the first test here more reliable I will try Andrew's idea
to use fillfactor to save free space. As I said earlier, I think that
second test isn't helpful and can be dropped.

> 2.
> @@ -15,13 +15,9 @@
>  SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
>  ERROR:  block number 100 is out of range for relation "test_rel_forks"
>  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
> - fsm_0
> 
> -  8192
> -(1 row)
> -
> +ERROR:  could not open file "base/50769/50798_fsm": No such file or directory
>  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
> -ERROR:  block number 10 is out of range for relation "test_rel_forks"
> +ERROR:  could not open file "base/50769/50798_fsm": No such file or directory
>
> This indicates that even though the Vacuum is executed, but the FSM
> doesn't get created.  This could be due to different BLCKSZ, but the
> failed machines don't seem to have a non-default value of it.  I am
> not sure why this could happen, maybe we need to check once in the
> failed regression database to see the size of relation?

I'm also having a hard time imagining why this failed. Just in case,
we could return ctid in a plpgsql loop and stop as soon as we see the
5th block. I've done that for some tests during development and is a
safer method anyway.

> 
>
> @@ -377,20 +383,9 @@ RelationGetBufferForTuple(Relation relation, Size len,
> +
> + /*
> + * In case we used an in-memory map of available blocks, reset it
> + * for next use.
> + */
> + if (targetBlock < HEAP_FSM_CREATION_THRESHOLD)
> + FSMClearLocalMap();
> +
>
> I think here you need to clear the map if it exists or clear it
> unconditionally, the earlier one would be better.

Ok, maybe all callers should call it unconditonally, but within the
function, check "if (FSM_LOCAL_MAP_EXISTS)"?

Thanks for investigating the failures -- I'm a bit pressed for time this week.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Robert Haas

On Sun, Jan 27, 2019 at 8:23 AM Andrew Gierth
 wrote:
> I think it makes more sense to say
> that we never inline if MATERIALIZED is specified, that we always inline
> if NOT MATERIALIZED is specified, and that if neither is specified the
> planner will choose (but perhaps note that currently it always chooses
> only based on refcount).

I, too, like this approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Early WIP/PoC for inlining CTEs

2019-01-28 Thread Robert Haas

On Mon, Jan 21, 2019 at 10:28 AM Tom Lane  wrote:
> Andreas Karlsson  writes:
> > I have a minor biksheddish question about the syntax.
> > You proposed:
> > WITH cte_name [[NOT] MATERIALIZED] AS (query) main_query
> > While Andrew proposed:
> > WITH cte_name AS [[NOT] MATERIALIZED] (query) main_query
> > Do people have any preference between these two?
>
> FWIW, I'd independently thought that the latter is more readable,
> and probably less likely to have syntax problems with future
> extensions (since AS is already fully reserved).  Didn't get
> around to mentioning it yet, but +1 for putting AS first.

It seems to me that as long as the query has to be surrounded by
non-optional parentheses and the options list never starts with a
parenthesis, there isn't much room for a grammar conflict either way.
If the query didn't have to be surrounded by parentheses, you would
want to put the options before the word AS so that the word AS would
serve as an unambiguous terminator for the options specification, but
if the query must be preceded by an opening parenthesis then as long
as the options list can't include an option that begins with such a
parenthesis we are in good shape.

However, generally we have not had great luck with just sticking
keywords in there (cf. VACUUM, ANALYZE, EXPLAIN, COPY) which is why I
suggested using a flexible syntax with parenthesized options.  And
then you're going to have trouble getting bison to figure out what to
do after...

WITH something AS ( some_keyword

...because some_keyword might be the an option name or the first word
of a query.  And putting that syntax before AS isn't any better,
because now it can be confused with a column name list.

I am not deeply worked up about this, just proposing some things to think about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Remove references to Majordomo

2019-01-28 Thread Stephen Frost

Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Mon, Jan 28, 2019 at 7:26 PM Tom Lane  wrote:
> 
> > Stephen Frost  writes:
> > >> On Sun, Jan 27, 2019 at 2:28 AM Noah Misch  wrote:
> > >>> What are those blocked infrastructure improvements?
> >
> > > The specific improvements we're talking about are DKIM/DMARC/SPF, which
> > > is becoming more and more important to making sure that the email from
> > > our lists can actually get through to the subscribers.
> >
> > Certainly those are pretty critical.  But can you give us a quick
> > refresher on why dropping the @postgresql.org list aliases is
> > necessary for that?  I thought we'd already managed to make the
> > lists compliant with those specs.
> 
> I believe it doesn't, as Stephen also agreed with upthread.
> 
> We needed to move our *sending* out of the postgresql.org domain in order
> to be able to treat them differently. But there is nothing preventing us
> from receiving to e.g. pgsql-b...@postgresql.org and internally forward it
> to @lists.postgresql.org, where we then deliver from.

Yes, I *think* this will work, as long as we are sending it back out
from pgsql-b...@lists.postgresql.org then we should be able to have SPF
records for lists.postgresql.org and downstream mail servers should be
happy with that, though I still want to actually test it out in our test
instance of PGLister.

This is the main thing- we want to have lists.postgresql.org (and
friends) have SPF (and maybe DKIM..) records which basically say that
malur is allowed to send mail out from those lists (or with those lists
in the From: of the email in the case of DKIM), but we don't want to
make everyone who is sending email from a @postgresql.org have to relay
through our mail servers (well, at least not today..  we may get to a
point in the spam wars where we *have* to do that or their email ends up
not going through, but we aren't quite there yet).

> I believe we *can* do the same for all lists, but that part is more a
> matter of cleaning up our infrastructure, which has a fair amount of cruft
> to deal with those things. We have an easy workaround for a couple of lists
> which owuld take only a fairly small amount of traffic over it, but we'd
> like to get rid of the cruft to deal with the large batch of them.

Yes, there's this aspect of it also.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: pgsql: Remove references to Majordomo

2019-01-28 Thread Magnus Hagander

On Mon, Jan 28, 2019 at 7:26 PM Tom Lane  wrote:

> Stephen Frost  writes:
> >> On Sun, Jan 27, 2019 at 2:28 AM Noah Misch  wrote:
> >>> What are those blocked infrastructure improvements?
>
> > The specific improvements we're talking about are DKIM/DMARC/SPF, which
> > is becoming more and more important to making sure that the email from
> > our lists can actually get through to the subscribers.
>
> Certainly those are pretty critical.  But can you give us a quick
> refresher on why dropping the @postgresql.org list aliases is
> necessary for that?  I thought we'd already managed to make the
> lists compliant with those specs.
>

I believe it doesn't, as Stephen also agreed with upthread.

We needed to move our *sending* out of the postgresql.org domain in order
to be able to treat them differently. But there is nothing preventing us
from receiving to e.g. pgsql-b...@postgresql.org and internally forward it
to @lists.postgresql.org, where we then deliver from.

I believe we *can* do the same for all lists, but that part is more a
matter of cleaning up our infrastructure, which has a fair amount of cruft
to deal with those things. We have an easy workaround for a couple of lists
which owuld take only a fairly small amount of traffic over it, but we'd
like to get rid of the cruft to deal with the large batch of them.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Covering GiST indexes

2019-01-28 Thread Andrey Borodin

Thank you very much for reviewing the patch!

> 28 янв. 2019 г., в 12:15, Andreas Karlsson  написал(а):
> 
> = Code
> 
> * Have some minor style issues like that there should be spaces around || (in 
> gistcanreturn()) and ? and : (in gistFormTuple()).
Fixed.
> 
> * I do not see any need for adding the new parameter to gistFormTuple. As far 
> as I can tell isTruncated is always equal to !isleaf.
You are right. I've removed isTruncated parameter.
> 
> * The comment below from gistFormTuple() does not look correct. No allocation 
> is taking place.
> 
> /*
> * Allocate each included attribute.
> */
Fixed.
> 
> * Why do you set a supportCollation for the included columns? As far as I can 
> tell the collation is only used by the various GiST support functions. Also 
> in theory we should be able to skip initializing these array entires, but it 
> is probably for the best that we do.
Removed supportCollation.
> 
> * I think you should check for the attno first in gistcanreturn() to take 
> advantage of the short circuiting.
Done.
> 
> * I am no fan of the tupdesc vs truncTupdesc separation and think that it is 
> a potential hazard, but I do not have any better suggestion right now.
B-tree is copying tupdesc every time they truncate tuple. We need tuple 
truncation a little more often: when we are doing page split, we have to form 
all page tuples, truncated.
Initially, I've optimized only this case, but this led to prepared tupledesc 
for truncated tuples.
> 
> * There is no test case for exclusion constraints, and I feel since that is 
> one of the more important uses we should probably have at least one such test 
> case.

Actually, I did not understand this test case. Can you elaborate more on this? 
How included attributes should participate in exclude index? What for?

> = Minor comments
> 
> * I think that "the" should have been kept in the text below.
> 
> -Currently, only the B-tree index access method supports this feature.
> +Currently, B-tree and GiST index access methods supports this 
> feature.
> 
Fixed.
> * I am not a native speaker but I think it should be "use the INCLUDE clause" 
> in the diff below, and I think it also should be "without any GiST operator 
> class".
> 
> +   A GiST index can be covering, i.e. use INCLUDE clause.
> +   Included columns can have data types without GiST operator class. Included
> +   attributes will be stored uncompressed.
> 
Fixed.
> * I think you should write something like "Included attributes always support 
> index-only scans." rather than "Included attributes can be returned." in the 
> comment for gistcanreturn().
> 
Fixed, but slightly reworded.

> * Why the random noise in the diff below? I think using "(c3) INCLUDE (c4)" 
> for both gist and rtree results in a cleaner patch.
I've used columns with and without opclass in INCLUDE. This led to these 
seemingly random changes.

PFA v6. Thanks for reviewing!


Best regards, Andrey Borodin.



0001-Covering-GiST-v6.patch
Description: Binary data

Re: pgsql: Remove references to Majordomo

2019-01-28 Thread Tom Lane

Stephen Frost  writes:
>> On Sun, Jan 27, 2019 at 2:28 AM Noah Misch  wrote:
>>> What are those blocked infrastructure improvements?

> The specific improvements we're talking about are DKIM/DMARC/SPF, which
> is becoming more and more important to making sure that the email from
> our lists can actually get through to the subscribers.

Certainly those are pretty critical.  But can you give us a quick
refresher on why dropping the @postgresql.org list aliases is
necessary for that?  I thought we'd already managed to make the
lists compliant with those specs.

regards, tom lane

Re: [PATCH] Allow anonymous rowtypes in function return column definition

2019-01-28 Thread Elvis Pranskevichus

On Sunday, January 13, 2019 4:57:48 PM EST Tom Lane wrote:

> I also wonder why we'd allow RECORD but not RECORDARRAY.  

That's by omission.  There's no reason to refuse RECORDARRAY here for 
the same reason why RECORD is accepted.

> More generally, why not any polymorphic type? [...] the
> fact that a record value is self-identifying as long as you know
> it's a record, whereas a random Datum is not a self-identifying
> member of the type class "anyelement", for instance.

Yes that's the reason.  We allow "record" in coldeflist because it 
only happens near a RangeFunction, and set-returning functions always do 
"BlessTupleDesc", which means that RECORDOID data would always have an 
associated TupleDesc and can be processed as a regular composite type.  
Recursion is not an issue, since every instance would have a separate 
TupleDesc even if the shape is the same.

> I feel a bit uncomfortable about defining the new flag to
> CheckAttributeNamesTypes as "allow_anonymous_records"; that
> seems pretty short-sighted and single-purpose, especially in
> view of there being no obvious reason why it shouldn't accept
> RECORDARRAY too.  Maybe with a clearer explanation of the
> issues above, we could define that flag in a more on-point way.

I renamed it to "in_coldeflist", which seems like a clearer indication 
that we are validating that, and not a regular table definition.

> BTW, it strikes me that maybe the reason ANYARRAY isn't insane
> to allow in pg_statistic has to do with arrays also being
> self-identifying members of that type class

That's true.  Array data encode the OID of their element type, but that 
only allows sending the data out, as subscripting or casting anyarray is 
not allowed.  There also seems to be no guarantee that the actual type 
of the array doesn't change from row to row in such a scenario.  Given 
that, I think it would be best to keep anyarray restricted to the system 
catalogs.

I attached the updated patch.
 
   Elvis>From 2841fd41db106ae09b68d3b5cf29901e98e2446a Mon Sep 17 00:00:00 2001
From: Elvis Pranskevichus 
Date: Thu, 6 Dec 2018 17:16:28 -0500
Subject: [PATCH] Allow anonymous rowtypes in function return column definition

Currently, the following query

SELECT q.b = row(2)
FROM unnest(ARRAY[row(1, row(2))]) AS q(a int, b record);

would fail with

ERROR:  column "b" has pseudo-type record

This is due to CheckAttributeNamesTypes() being used on a function
coldeflist as if it was a real relation definition.  But in the context
of a query there seems to be no harm in allowing this, as other ways of
manipulating anonymous rowtypes work well, e.g.:

SELECT (ARRAY[ROW(1, ROW(2))])[1];
---
 src/backend/catalog/heap.c | 42 ++
 src/backend/catalog/index.c|  2 +-
 src/backend/commands/tablecmds.c   |  4 +--
 src/backend/parser/parse_relation.c|  5 +--
 src/include/catalog/heap.h |  6 ++--
 src/test/regress/expected/rowtypes.out | 10 ++
 src/test/regress/sql/rowtypes.sql  |  5 +++
 7 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index cc865de627..103b4fb96a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -451,7 +451,8 @@ heap_create(const char *relname,
  */
 void
 CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
-		 bool allow_system_table_mods)
+		 bool allow_system_table_mods,
+		 bool in_coldeflist)
 {
 	int			i;
 	int			j;
@@ -509,7 +510,8 @@ CheckAttributeNamesTypes(TupleDesc tupdesc, char relkind,
 		   TupleDescAttr(tupdesc, i)->atttypid,
 		   TupleDescAttr(tupdesc, i)->attcollation,
 		   NIL, /* assume we're creating a new rowtype */
-		   allow_system_table_mods);
+		   allow_system_table_mods,
+		   in_coldeflist);
 	}
 }
 
@@ -532,7 +534,8 @@ void
 CheckAttributeType(const char *attname,
    Oid atttypid, Oid attcollation,
    List *containing_rowtypes,
-   bool allow_system_table_mods)
+   bool allow_system_table_mods,
+   bool in_coldeflist)
 {
 	char		att_typtype = get_typtype(atttypid);
 	Oid			att_typelem;
@@ -540,12 +543,24 @@ CheckAttributeType(const char *attname,
 	if (att_typtype == TYPTYPE_PSEUDO)
 	{
 		/*
-		 * Refuse any attempt to create a pseudo-type column, except for a
-		 * special hack for pg_statistic: allow ANYARRAY when modifying system
-		 * catalogs (this allows creating pg_statistic and cloning it during
-		 * VACUUM FULL)
+		 * Generally speaking, we don't allow pseudo-types in column
+		 * definitions, with the following exceptions:
+		 *
+		 * First, we allow the record pseudo-type (and its array type) in
+		 * the column definition of a set-returning function.  In this case
+		 * the record type is "blessed", i.e. there is an actual TupleDesc
+		 * associated with it and it can be treated as a normal composite
+		 * type.  Com

Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Tom Lane

Robert Haas  writes:
> On Sat, Jan 26, 2019 at 12:35 PM Tom Lane  wrote:
>> Attached is an 0004 that makes a stab at providing some intelligence
>> for unnest() and the integer cases of generate_series().

> That looks awesome.

> I'm somewhat dubious about whole API.  It's basically -- if you have a
> problem and a PhD in PostgreSQL-ology, you can write some C code to
> fix it.  On the other hand, the status quo is that you may as well
> just forget about fixing it, which is clearly even worse.  And I don't
> really know how to do better.

Well, you need to be able to write a C extension :-(.  I kinda wish
that were not a requirement, but in practice I think the main audience
is people like PostGIS, who already cleared that bar.  I hope that
we'll soon have a bunch of examples, like those in the 0004 patch,
that people can look at to see how to do things in this area.  I see
no reason to believe it'll be all that much harder than anything
else extension authors have to do.

regards, tom lane

Re: pgsql: Remove references to Majordomo

2019-01-28 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Sun, Jan 27, 2019 at 2:28 AM Noah Misch  wrote:
> > > Does this also implicitly mean we've just agreed to push back the
> > > retirement of the @postgresql.org aliases for the lists until v11 is
> > > EOL..?
> > >
> > > I can understand the concern around translators and back-patching and
> > > such, but I don't think we should be waiting another 5 years before we
> > > retire those aliases as having them is preventing us from moving forward
> > > with other infrastructure improvements to our email systems.
> >
> > What are those blocked infrastructure improvements?
> 
> +1 for that question.  I find myself wondering what infrastructure
> improvements could possibly be important enough to justify rushing
> this change (or for that matter, ever making it at all).

The specific improvements we're talking about are DKIM/DMARC/SPF, which
is becoming more and more important to making sure that the email from
our lists can actually get through to the subscribers.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Rename nodes/relation.h => nodes/pathnodes.h ?

2019-01-28 Thread Robert Haas

On Mon, Jan 28, 2019 at 10:18 AM Tom Lane  wrote:
> In the pluggable-storage discussion, there was some talk of renaming
> nodes/relation.h to avoid confusion with the new access/relation.h
> header.  I think this is a fine idea, not only because of that conflict
> but because "relation.h" has never made a lot of sense as the file's
> name.

+1.

> After a bit of thought, I propose "pathnodes.h" as the new name.
> That fits in with the other major headers in that directory
> (primnodes.h, parsenodes.h, plannodes.h, execnodes.h), and it seems
> like a reasonable summary of what's in it.  Admittedly, Path nodes
> as such are barely a third of the file's bulk; but I don't see any
> equally pithy way to describe the rest of it, unless something like
> planner_data.h, which is pretty unmelodious.

Yeah, it's not perfect, but it's better than what we've got now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-28 Thread Jesper Pedersen


On 1/28/19 11:30 AM, Fabrízio de Royes Mello wrote:

IMHO you should use long option name '--jobs' like others.



Thanks for your feedback !

Done, and v4 attached.

Best regards,
 Jesper
>From 142b4723f433f39d0eed2ced4beccc3721451103 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Fri, 21 Dec 2018 05:05:31 -0500
Subject: [PATCH] Highlight that the --jobs option isn't passed down to
 vacuumdb by default.

Author: Jesper Pedersen 
---
 doc/src/sgml/ref/pgupgrade.sgml |  6 +-
 src/bin/pg_upgrade/check.c  | 28 ++--
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..b1da0cc7a2 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -395,7 +395,11 @@ NET STOP postgresql-&majorversion;
  in parallel;  a good place to start is the maximum of the number of
  CPU cores and tablespaces.  This option can dramatically reduce the
  time to upgrade a multi-database server running on a multiprocessor
- machine.
+ machine. Note, that this option isn't passed to the
+ vacuumdb application by default.
+ The system environment variable VACUUMDB_OPTS can be
+ used to pass extra options to the vacuumdb
+ application during the script execution such as --jobs.
 
 
 
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index fc5aa7010f..3b478ef95f 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -495,15 +495,31 @@ create_script_for_cluster_analyze(char **analyze_script_file_name)
 			ECHO_QUOTE, ECHO_QUOTE);
 	fprintf(script, "echo %sthis script and run:%s\n",
 			ECHO_QUOTE, ECHO_QUOTE);
-	fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
-			new_cluster.bindir, user_specification.data,
-	/* Did we copy the free space files? */
-			(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
-			"--analyze-only" : "--analyze", ECHO_QUOTE);
+	if (user_opts.jobs <= 1 || GET_MAJOR_VERSION(new_cluster.major_version) < 905)
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
+	else
+	{
+		fprintf(script, "echo %s\"%s/vacuumdb\" %s--jobs %d --all %s%s\n", ECHO_QUOTE,
+new_cluster.bindir, user_specification.data, user_opts.jobs,
+/* Did we copy the free space files? */
+(GET_MAJOR_VERSION(old_cluster.major_version) >= 804) ?
+"--analyze-only" : "--analyze", ECHO_QUOTE);
+	}
 	fprintf(script, "echo%s\n\n", ECHO_BLANK);
 
-	fprintf(script, "\"%s/vacuumdb\" %s--all --analyze-in-stages\n",
+#ifndef WIN32
+	fprintf(script, "\"%s/vacuumdb\" %s$VACUUMDB_OPTS --all --analyze-in-stages\n",
 			new_cluster.bindir, user_specification.data);
+#else
+	fprintf(script, "\"%s\\vacuumdb.exe\" %s%%VACUUMDB_OPTS%% --all --analyze-in-stages\n",
+			new_cluster.bindir, user_specification.data);
+#endif
 	/* Did we copy the free space files? */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 804)
 		fprintf(script, "\"%s/vacuumdb\" %s--all\n", new_cluster.bindir,
-- 
2.17.2

Re: Allowing extensions to supply operator-/function-specific info

2019-01-28 Thread Robert Haas

On Sat, Jan 26, 2019 at 12:35 PM Tom Lane  wrote:
> Attached is an 0004 that makes a stab at providing some intelligence
> for unnest() and the integer cases of generate_series().

That looks awesome.

I'm somewhat dubious about whole API.  It's basically -- if you have a
problem and a PhD in PostgreSQL-ology, you can write some C code to
fix it.  On the other hand, the status quo is that you may as well
just forget about fixing it, which is clearly even worse.  And I don't
really know how to do better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pgsql: Remove references to Majordomo

2019-01-28 Thread Robert Haas

On Sun, Jan 27, 2019 at 2:28 AM Noah Misch  wrote:
> > Does this also implicitly mean we've just agreed to push back the
> > retirement of the @postgresql.org aliases for the lists until v11 is
> > EOL..?
> >
> > I can understand the concern around translators and back-patching and
> > such, but I don't think we should be waiting another 5 years before we
> > retire those aliases as having them is preventing us from moving forward
> > with other infrastructure improvements to our email systems.
>
> What are those blocked infrastructure improvements?

+1 for that question.  I find myself wondering what infrastructure
improvements could possibly be important enough to justify rushing
this change (or for that matter, ever making it at all).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-28 Thread Fabrízio de Royes Mello

On Mon, Jan 28, 2019 at 1:15 PM Jesper Pedersen 
wrote:
>
> Done.
>
> Attached is v3.
>

IMHO you should use long option name '--jobs' like others.

Regards,

--
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Header checking failures on LLVM-less machines

2019-01-28 Thread Tom Lane

Since the LLVM stuff went in, src/tools/pginclude/cpluspluscheck
fails on my main devel machine, because

In file included from /tmp/cpluspluscheck.jobsM6/test.cpp:3:
./src/include/jit/llvmjit_emit.h:13:25: error: llvm-c/Core.h: No such file or 
directory

and then there's a bunch of ensuing spewage due to missing typedefs
etc.  llvmjit.h has the same problem plus this additional pointless
aggravation:

In file included from /tmp/cpluspluscheck.jobsM6/test.cpp:3:
./src/include/jit/llvmjit.h:15:2: error: #error "llvmjit.h should only be 
included by code dealing with llvm"

I propose that we should fix this by making the whole bodies of those
two headers be #ifdef USE_LLVM.

regards, tom lane

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2019-01-28 Thread Robert Haas

On Fri, Jan 25, 2019 at 4:18 PM Alvaro Herrera  wrote:
> Right, the planner/executor "disconnect" is one of the challenges, and
> why I was trying to keep the old copy of the PartitionDesc around
> instead of building updated ones as needed.

I agree that would be simpler, but I just don't see how to make it
work.  For one thing, keeping the old copy around can't work in
parallel workers, which never had a copy in the first place.  For two
things, we don't have a really good mechanism to keep the
PartitionDesc that was used at plan time around until execution time.
Keeping the relation open would do it, but I'm pretty sure that causes
other problems; the system doesn't expect any residual references.

I know you had a solution to this problem, but I don't see how it can
work.  You said "Snapshots have their own cache (hash table) of
partition descriptors. If a partdesc is requested and the snapshot has
already obtained that partdesc, the original one is returned -- we
don't request a new one from partcache."  But presumably this means
when the last snapshot is unregistered, the cache is flushed
(otherwise, when?) and if that's so then this relies on the snapshot
that was used for planning still being around at execution time, which
I am pretty sure isn't guaranteed.

Also, and I think this point is worthy of some further discussion, the
thing that really seems broken to me about your design is the idea
that it's OK to use the query or transaction snapshot to decide which
partitions exist.  The problem with that is that some query or
transaction with an old snapshot might see as a partition some table
that has been dropped or radically altered - different column types,
attached to some other table now, attached to same table but with
different bounds, or just dropped.  And therefore it might try to
insert data into that table and fail in all kinds of crazy ways, about
the mildest of which is inserting data that doesn't match the current
partition constraint.  I'm willing to be told that I've misunderstood
the way it all works and this isn't really a problem for some reason,
but my current belief is that not only is it a problem with your
design, but that it's such a bad problem that there's really no way to
fix it and we have to abandon your entire approach and go a different
route.  If you don't think that's true, then perhaps we should discuss
it further.

> > I suppose the reason for this is so
> > that we don't have to go to the expense of copying the partition
> > bounds from the PartitionDesc into the final plan, but it somehow
> > seems a little scary to me.  Perhaps I am too easily frightened, but
> > it's certainly a problem from the point of view of this project, which
> > wants to let the PartitionDesc change concurrently.
>
> Well, my definition of the problem started with the assumption that we
> would keep the partition array indexes unchanged, so "change
> concurrently" is what we needed to avoid.  Yes, I realize that you've
> opted to change that definition.

I don't think I made a conscious decision to change this, and I'm kind
of wondering whether I have missed some better approach here.  I feel
like the direction I'm pursuing is an inevitable consequence of having
no good way to keep the PartitionDesc around from plan-time to
execution-time, which in turn feels like an inevitable consequence of
the two points I made above: there's no guarantee that the plan-time
snapshot is still registered anywhere by the time we get to execution
time, and even if there were, the associated PartitionDesc may point
to tables that have been drastically modified or don't exist any more.
But it's possible that my chain-of-inevitable-consequences has a weak
link, in which case I would surely like it if you (or someone else)
would point it out to me.

> I may have forgotten some of your earlier emails on this, but one aspect
> (possibly a key one) is that I'm not sure we really need to cope, other
> than with an ERROR, with queries that continue to run across an
> attach/detach -- moreso in absurd scenarios such as the ones you
> described where the detached table is later re-attached, possibly to a
> different partitioned table.  I mean, if we can just detect the case and
> raise an error, and this let us make it all work reasonably, that might
> be better.

Well, that's an interesting idea.  I assumed that users would hate
that kind of behavior with a fiery passion that could never be
quenched.  If not, then the problem changes from *coping* with
concurrent changes to *detecting* concurrent changes, which may be
easier, but see below.

> I think detaching partitions concurrently is a necessary part of this
> feature, so I would prefer not to go with a solution that works for
> attaching partitions but not for detaching them.  That said, I don't see
> why it's impossible to adjust the partition maps in both cases.  But I
> don't have anything better than hand-waving ATM.

The general problem here goes back to

Re: Alternative to \copy in psql modelled after \g

2019-01-28 Thread Daniel Verite

Tom Lane wrote:

> > Now as far as I can see, there is nothing that \copy to file or program
> > can do that COPY TO STDOUT cannot do.
> 
> I don't think there's a way to get the effect of "\copy to pstdout"
> (which, IIRC without any caffeine, means write to psql's stdout regardless
> of where queryFout is currently pointing).

\g /dev/stdout would work already on systems like Linux that provide
this alias.
Otherwise "\g -" looks good as a portable solution.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2019-01-28 Thread Heikki Linnakangas


On 09/01/2019 02:47, Peter Geoghegan wrote:

On Fri, Dec 28, 2018 at 3:32 PM Peter Geoghegan  wrote:

On Fri, Dec 28, 2018 at 3:20 PM Heikki Linnakangas  wrote:

I'm envisioning that you have an array, with one element for each item
on the page (including the tuple we're inserting, which isn't really on
the page yet). In the first pass, you count up from left to right,
filling the array. Next, you compute the complete penalties, starting
from the middle, walking outwards.



Ah, right. I think I see what you mean now.



Leave it with me. I'll need to think about this some more.


Attached is v10 of the patch series, which has many changes based on
your feedback. However, I didn't end up refactoring _bt_findsplitloc()
in the way you described, because it seemed hard to balance all of the
concerns there. I still have an open mind on this question, andAt a 
recognize the merit in what you suggested. Perhaps it's possible to

reach a compromise here.


I spent some time first trying to understand the current algorithm, and 
then rewriting it in a way that I find easier to understand. I came up 
with the attached. I think it optimizes for the same goals as your 
patch, but the approach  is quite different. At a very high level, I 
believe the goals can be described as:


1. Find out how much suffix truncation is possible, i.e. how many key 
columns can be truncated away, in the best case, among all possible ways 
to split the page.


2. Among all the splits that achieve that optimum suffix truncation, 
find the one with smallest "delta".


For performance reasons, it doesn't actually do it in that order. It's 
more like this:


1. First, scan all split positions, recording the 'leftfree' and 
'rightfree' at every valid split position. The array of possible splits 
is kept in order by offset number. (This scans through all items, but 
the math is simple, so it's pretty fast)


2. Compute the optimum suffix truncation, by comparing the leftmost and 
rightmost keys, among all the possible split positions.


3. Split the array of possible splits in half, and process both halves 
recursively. The recursive process "zooms in" to the place where we'd 
expect to find the best candidate, but will ultimately scan through all 
split candidates, if no "good enough" match is found.



I've only been testing this on leaf splits. I didn't understand how the 
penalty worked for internal pages in your patch. In this version, the 
same algorithm is used for leaf and internal pages. I'm sure this still 
has bugs in it, and could use some polishing, but I think this will be 
more readable way of doing it.



What have you been using to test this? I wrote the attached little test 
extension, to explore what _bt_findsplitloc() decides in different 
scenarios. It's pretty rough, but that's what I've been using while 
hacking on this. It prints output like this:


postgres=# select test_split();
NOTICE:  test 1:
left2/358: 1 0
left  358/358: 1 356
right   1/ 51: 1 357
right  51/ 51: 1 407  SPLIT TUPLE
split ratio: 12/87

NOTICE:  test 2:
left2/358: 0 0
left  358/358: 356 356
right   1/ 51: 357 357
right  51/ 51: 407 407  SPLIT TUPLE
split ratio: 12/87

NOTICE:  test 3:
left2/358: 0 0
left  320/358: 10 10  SPLIT TUPLE
left  358/358: 48 48
right   1/ 51: 49 49
right  51/ 51: 99 99
split ratio: 12/87

NOTICE:  test 4:
left2/309: 1 100
left  309/309: 1 407  SPLIT TUPLE
right   1/100: 2 0
right 100/100: 2 99
split ratio: 24/75

Each test consists of creating a temp table with one index, and 
inserting rows in a certain pattern, until the root page splits. It then 
prints the first and last tuples on both pages, after the split, as well 
as the tuple that caused the split. I don't know if this is useful to 
anyone but myself, but I thought I'd share it.


- Heikki
/*-
 *
 * nbtsplitloc.c
 *	  Choose split point code for Postgres btree implementation.
 *
 * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California
 *
 *
 * IDENTIFICATION
 *	  src/backend/access/nbtree/nbtsplitloc.c
 *
 *-
 */
#include "postgres.h"

#include "access/nbtree.h"
#include "storage/lmgr.h"

typedef struct
{
	/* FindSplitData candidate split */
	int			leftfree;
	int			rightfree;
	OffsetNumber firstoldonright;
	bool		newitemonleft;
	IndexTuple	lastleft_tuple;
	IndexTuple	firstright_tuple;
}			SplitPoint;

typedef struct
{
	/* context data for _bt_checksplitloc */
	Relation	rel;
	bool		is_leaf;		/* T if splitting a leaf page */
	OffsetNumber newitemoff;	/* where the new item is to be inserted */
	int			leftspace;		/* space available for items on left page */
	int			rightspace;		/* space available for items on right page */
	int			dataitemstotal; /* space taken by all items, old and new */

	int			ncandidates;	/* current numbe

Re: Alternative to \copy in psql modelled after \g

2019-01-28 Thread Tom Lane

"Daniel Verite"  writes:
>   Tom Lane wrote:
>> A variant that might or might not be safer is "\g > insist on you putting a mark there that shows you intended to read.

> I haven't written any patch yet, but I was thinking of submitting
> something like that, with the addition of "\g >foo" as a synonym of
> "\g foo" for the symmetry with "<".

+1, the same had occurred to me.

regards, tom lane

Re: Rename nodes/relation.h => nodes/pathnodes.h ?

2019-01-28 Thread Amit Langote

On Tue, Jan 29, 2019 at 12:18 AM Tom Lane  wrote:
> In the pluggable-storage discussion, there was some talk of renaming
> nodes/relation.h to avoid confusion with the new access/relation.h
> header.  I think this is a fine idea, not only because of that conflict
> but because "relation.h" has never made a lot of sense as the file's
> name.
>
> After a bit of thought, I propose "pathnodes.h" as the new name.
> That fits in with the other major headers in that directory
> (primnodes.h, parsenodes.h, plannodes.h, execnodes.h), and it seems
> like a reasonable summary of what's in it.  Admittedly, Path nodes
> as such are barely a third of the file's bulk; but I don't see any
> equally pithy way to describe the rest of it, unless something like
> planner_data.h, which is pretty unmelodious.

optnodes.h, as in optimization-related nodes?  I like pathnodes.h too though.

Thanks,
Amit

Re: Alternative to \copy in psql modelled after \g

2019-01-28 Thread Daniel Verite

Tom Lane wrote:

> A variant that might or might not be safer is "\g  insist on you putting a mark there that shows you intended to read.
>
> Also, not quite clear what we'd do about copy-from-program.
> I think "\g |foo" is definitely confusing for that.  "\g foo|"
> would be better if it doesn't have syntax issues.

I haven't written any patch yet, but I was thinking of submitting
something like that, with the addition of "\g >foo" as a synonym of
"\g foo" for the symmetry with "<".
Perl's open() also uses the "program |" syntax.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

1 2 >

1 - 100 of 124 matches

Mail list logo