Re: SHOW ALL does not honor pg_read_all_settings membership

2018-04-16 Thread Michael Paquier
On Mon, Apr 16, 2018 at 02:32:10PM +0200, Laurenz Albe wrote:
> Now that the dust from the last commitfest is settling, I'll make a second
> attempt to attract attention for this small bug fix.
> 
> The original commit was Simon's.

Thanks for the ping.

This was new as of v10, so this cannot be listed as an open item still I
have added that under the section for older bugs, because you are right
as far as I can see.

GetConfigOption is wrong by the way, as restrict_superuser means that
all members of the group pg_read_all_settings can read
GUC_SUPERUSER_ONLY params, and not only superusers, so the comment at
least needs a fix, the variable ought to be renamed as well.
--
Michael


signature.asc
Description: PGP signature


Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.

2018-04-16 Thread Michael Paquier
On Sun, Apr 08, 2018 at 08:46:04AM +0300, Arseny Sher wrote:
> I've discovered a couple of bugs in logical decoding code, both leading
> to incorrect decoding results in somewhat rare cases. First, xmin of
> slots is advanced too early. This affects the results only when
> interlocking allows to perform DDL concurrently with looking at the
> schema. In fact, I was not aware about such DDL until at
> https://www.postgresql.org/message-id/flat/87tvu0p0jm.fsf%40ars-thinkpad#87tvu0p0jm.fsf@ars-thinkpad
> I raised this question and Andres pointed out ALTER of composite
> types. Probably there are others, I am not sure; it would be interesting
> to know them.
> 
> Another problem is that new snapshots are never queued to known
> subxacts. It means decoding results can be wrong if toplevel doesn't
> write anything while subxact does.

Most people are recovering from the last commit fest, where many patches
have been discussed and dealt with.  I have not looked in details at
your patch so I cannot say is legit or not, but for now I would
recommend to register this patch to the next commit fest under the
category "Bug Fixes" so as it does not fall into the cracks:
https://commitfest.postgresql.org/18/

I have added an entry to the open items in the section for older bugs:
https://wiki.postgresql.org/wiki/PostgreSQL_11_Open_Items#Older_Bugs
However this list tends to be... Er... Ignored.

> Please see detailed description of the issues, tests which reproduce
> them and fixes in the attached patch.

Those are always good things to have in a patch.
--
Michael


signature.asc
Description: PGP signature


Re: Proposal: Adding json logging

2018-04-16 Thread Michael Paquier
On Mon, Apr 16, 2018 at 07:52:58PM -0400, Peter Eisentraut wrote:
> I have used https://github.com/mpihlak/pg_logforward in the past, which
> seems to be about the same thing.

Didn't know this one.  Thanks.

> I have also had good success using syslog.  While syslog is not very
> structured, the setting syslog_split_messages allows sending log entries
> that include newlines in one piece, which works well if you have some
> kind of full-text search engine at the receiving end.

syslog suffers from the possibility to lose messages if I recall
correctly, right?  This may matter for some critical environments.

Upstream code has escape_json() directly included, which is able to do
the job and makes sure that a single JSON entry is not broken into
multiple lines.  That's what my jsonlog uses to format the strings used,
and what I can see pg_logforward does as well witha custom copy.

As a whole model, producing one JSON object per line and per log-entry
is the most natural format in my opinion.

One thing which is perhaps sensitive for JSON is the timestamp format.
The JSON specification does not decide what should be the format of
timestamps, still parser facilities are somewhat all pointing into using
ISO 8601 with stuff like Javascript Date's toJSON method.  There are
some side issues with the use of UTC..  So the thing is sorta of messy.

However, as JSON entries are usually larger than normal log entries,
getting log entries broken into multiple lines is easier if not using
logging_collector.  People normally don't do that, but I received
complains on the matter as well when using Postgres in Docker container
for example.  So documenting that logging_collector needs to be enabled
is important if this log format shows up in Postgres.
--
Michael


signature.asc
Description: PGP signature


Re: Overcoming SELECT ... FOR UPDATE permission restrictions

2018-04-16 Thread Tom Lane
Michael Paquier  writes:
> I also don't quite understand the argument of application relying on
> this behavior.  If they do, that's wrong anyway, so the risk of
> operation disruptions for shared environments would matter more in my
> opinion.

I'm not totally sure about that.  If you suppose that the only purpose
of doing SELECT FOR UPDATE is to clear the way for a subsequent UPDATE,
then people who are using it would certainly have had to grant the
necessary UPDATE permission to let the second command go through.
But I'm not 100% sure that that's the only use-case.  S-F-U could be
useful strictly for mutual-exclusion perhaps.  Or maybe your application
does S-F-U to get row locks, then does DELETE rather than UPDATE.

Still, it seems unlikely that somebody would be doing those sorts of
things through two levels of view.  So maybe the set of applications
that would get broken is vanishingly small.

regards, tom lane



Re: [HACKERS] Runtime Partition Pruning

2018-04-16 Thread David Rowley
On 17 April 2018 at 14:33, Alvaro Herrera  wrote:
> David Rowley wrote:
>
>> For a while, during my review of the faster partition pruning patch I
>> was asking Amit to add pfree() calls in various places for this exact
>> reason, but in the end, I gave up and decided it was easier to just
>> create a new memory context to call the planner function from. I've
>> now forgotten the exact reason why I finally decided it was too much
>> trouble.  The pruning code now works using your step logic so perhaps
>> that reason no longer applies, although, on a quick scan of the
>> pruning code now, it seems to require that get_matching_partitions
>> performs a deep pfree of each PruneStepResult. However, there is still
>> partkey_datum_from_expr which performs ExecInitExpr, although perhaps
>> that can just be done once, and the result stashed in the
>> PartitionPruneContext.
>
> I think trying to replace a well-placed MemoryContextReset (or Delete)
> with a bunch of individual pfrees is pointless.

I agree. I think I'd sleep better at night with the context reset in
there rather than hoping we've managed to pfree everything.

I did go and start working on a patch to test how possible this would
be and came up with the attached. I've left a stray
MemoryContextStatsDetail call in there which does indicate that
something is not being freed. I'm just not sure what it is yet.

The patch does happen to improve performance slightly, but that is
most likely due to the caching of the ExprStates rather than the
change of memory management. It's not really possible to do that with
the reset unless we stored the executor's memory context in
PartitionPruneContext and did a context switch back inside
partkey_datum_from_expr before calling ExecInitExpr.

My test case was as follows:

create table p (a int, value int) partition by hash (a);
select 'create table p'||x|| ' partition of p for values with (modulus
10, remainder '||x||');' from generate_series(0,9) x;
\gexec
create table t1 (a int);

insert into p select x,x from generate_Series(1,1000) x;
insert into t1 select x from generate_series(1,1000) x;

create index on p(a);

set enable_hashjoin = 0;
set enable_mergejoin = 0;
explain analyze select count(*) from t1 inner join p on t1.a=p.a;


-- Unpatched
Execution Time: 19725.981 ms
Execution Time: 19533.655 ms
Execution Time: 19542.854 ms

-- Patched
Execution Time: 17389.537 ms
Execution Time: 17603.802 ms
Execution Time: 17618.670 ms

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


recycle_mem_part_prune.patch
Description: Binary data


Re: [HACKERS] Runtime Partition Pruning

2018-04-16 Thread Alvaro Herrera
David Rowley wrote:

> For a while, during my review of the faster partition pruning patch I
> was asking Amit to add pfree() calls in various places for this exact
> reason, but in the end, I gave up and decided it was easier to just
> create a new memory context to call the planner function from. I've
> now forgotten the exact reason why I finally decided it was too much
> trouble.  The pruning code now works using your step logic so perhaps
> that reason no longer applies, although, on a quick scan of the
> pruning code now, it seems to require that get_matching_partitions
> performs a deep pfree of each PruneStepResult. However, there is still
> partkey_datum_from_expr which performs ExecInitExpr, although perhaps
> that can just be done once, and the result stashed in the
> PartitionPruneContext.

I think trying to replace a well-placed MemoryContextReset (or Delete)
with a bunch of individual pfrees is pointless.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Overcoming SELECT ... FOR UPDATE permission restrictions

2018-04-16 Thread Michael Paquier
On Mon, Apr 16, 2018 at 08:12:45PM +0300, Alexander Lakhin wrote:
> The worst scenario (with the current system views) I could think of is:
> user=> CREATE VIEW pgg AS SELECT * FROM pg_group;
> BEGIN TRANSACTION; SELECT * FROM pgg FOR UPDATE; SELECT pg_sleep(60);
> ROLLBACK;
> and the parallel operation:
> admin=> DROP ROLE testrole;
> hangs for one minute.
> But admin can observer the locks and kill the offending backend so it's
> hardly a critical issue.

No need to use pg_sleep(), you could just as well start a full
transaction block and let the wait happen forever.

The main point is that row-level locks are not strong enough to prevent
read-only operations, so critical operations like authentication can
still be triggered.  This can be used though to disrupt the activity of
all DDL operations if you take a lock on them, which sucks.  So while
that's a nuisance, it is always possible to counter it.

I also don't quite understand the argument of application relying on
this behavior.  If they do, that's wrong anyway, so the risk of
operation disruptions for shared environments would matter more in my
opinion.
--
Michael


signature.asc
Description: PGP signature


Re: Oddity in tuple routing for foreign partitions

2018-04-16 Thread Kyotaro HORIGUCHI
Hello.

At Tue, 17 Apr 2018 10:10:38 +0900, Amit Langote 
 wrote in 

> Fujita-san,
> 
> On 2018/04/16 20:25, Etsuro Fujita wrote:
> > Hi,
> > 
> > While updating the fix-postgres_fdw-WCO-handling patch, I noticed that
> > the patch for tuple routing for foreign partitions doesn't work well
> > with remote triggers.  Here is an example:
> > 
> > postgres=# create table loct1 (a int check (a in (1)), b text);
> > postgres=# create function br_insert_trigfunc() returns trigger as
> > $$begin new.b := new.b || ' triggered !'; return new; end$$ language
> > plpgsql;
> > postgres=# create trigger loct1_br_insert_trigger before insert on loct1
> > for each row execute procedure br_insert_trigfunc();
> > postgres=# create table itrtest (a int, b text) partition by list (a);
> > postgres=# create foreign table remp1 (a int check (a in (1)), b text)
> > server loopback options (table_name 'loct1');
> > postgres=# alter table itrtest attach partition remp1 for values in (1);
> > 
> > This behaves oddly:
> > 
> > postgres=# insert into itrtest values (1, 'foo') returning *;
> >  a |  b
> > ---+-
> >  1 | foo
> > (1 row)
> > 
> > INSERT 0 1
> > 
> > The new value of b in the RETURNING result should be concatenated with '
> > triggered !', as shown below:
> > 
> > postgres=# select tableoid::regclass, * from itrtest ;
> >  tableoid | a |b
> > --+---+-
> >  remp1| 1 | foo triggered !
> > (1 row)
> > 
> > The reason for that is: since that in ExecInitPartitionInfo, the core
> > calls BeginForeignInsert via ExecInitRoutingInfo before initializing
> > ri_returningList of ResultRelInfo for the partition, BeginForeignInsert
> > sees that the ri_returningList is NULL and incorrectly recognizes that
> > the local query doesn't have a RETURNING list.
> 
> Good catch!
> 
> > So, I moved the
> > ExecInitRoutingInfo call after building the ri_returningList (and before
> > handling ON CONFLICT because we reference the conversion map created by
> > that when handling that clause).  The first version of the patch called
> > BeginForeignInsert that order, but I updated the patch incorrectly.  :(
> 
> I didn't notice that when reviewing either.  Actually, I was under the
> impression that BeginForeignInsert consumes returningList from the
> ModifyTable node itself, not the ResultRelInfo, but now I see that
> ri_returningList was added exactly for BeginForeignInsert to consume.
> Good thing about that is that we get a Var-translated version instead of
> the original one that contains parent's attnos.
> 
> > Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
> > and added that to ExecInitPartitionInfo right after the> InitResultRelInfo 
> > call, because it would be better to abort the
> > operation as soon as we find the partition to be non-routable.
> 
> Sounds fine.

If I'm reading this correctly, ExecInitParititionInfo calls
ExecInitRoutingInfo so currently CheckValidityResultRel is called
for the child when partrel is created in ExecPrepareTupleRouting.
But the move of CheckValidResultRel results in letting partrel
just created in ExecPrepareTupleRouting not be checked at all
since ri_ParititionReadyForRouting is always set true in
ExecInitPartitionInfo.

> > Another
> > thing I noticed is the RT index of the foreign partition set to 1 in
> > postgresBeginForeignInsert to create a remote query; that would not work
> > for cases where the partitioned table has an RT index larger than 1 (eg,
> > the partitioned table is not the primary ModifyTable node); in which
> > cases, since the varno of Vars belonging to the foreign partition in the
> > RETURNING expressions is the same as the partitioned table, calling
> > deparseReturningList with the RT index set to 1 against the RETURNING
> > expressions would produce attrs_used to be NULL, leading to postgres_fdw
> > not retrieving actually inserted data from the remote, even if remote
> > triggers might change those data.  So, I fixed this as well, by setting
> > the RT index accordingly to the partitioned table.
> 
> Check.

I'm not sure but is it true that the parent is always at
resultRelation - 1?

> > Attached is a patch
> > for fixing these issues.  I'll add this to the open items list for PG11.
> >  (After addressing this, I'll post an updated version of the
> > fix-postgres_fdw-WCO-handling patch.)
> 
> Your patch seems to fix the issue and code changes seem fine to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Oddity in tuple routing for foreign partitions

2018-04-16 Thread Etsuro Fujita
Hi Amit,

(2018/04/17 10:10), Amit Langote wrote:
> On 2018/04/16 20:25, Etsuro Fujita wrote:
>> While updating the fix-postgres_fdw-WCO-handling patch, I noticed that
>> the patch for tuple routing for foreign partitions doesn't work well
>> with remote triggers.  Here is an example:
>>
>> postgres=# create table loct1 (a int check (a in (1)), b text);
>> postgres=# create function br_insert_trigfunc() returns trigger as
>> $$begin new.b := new.b || ' triggered !'; return new; end$$ language
>> plpgsql;
>> postgres=# create trigger loct1_br_insert_trigger before insert on loct1
>> for each row execute procedure br_insert_trigfunc();
>> postgres=# create table itrtest (a int, b text) partition by list (a);
>> postgres=# create foreign table remp1 (a int check (a in (1)), b text)
>> server loopback options (table_name 'loct1');
>> postgres=# alter table itrtest attach partition remp1 for values in (1);
>>
>> This behaves oddly:
>>
>> postgres=# insert into itrtest values (1, 'foo') returning *;
>>   a |  b
>> ---+-
>>   1 | foo
>> (1 row)
>>
>> INSERT 0 1
>>
>> The new value of b in the RETURNING result should be concatenated with '
>> triggered !', as shown below:
>>
>> postgres=# select tableoid::regclass, * from itrtest ;
>>   tableoid | a |b
>> --+---+-
>>   remp1| 1 | foo triggered !
>> (1 row)
>>
>> The reason for that is: since that in ExecInitPartitionInfo, the core
>> calls BeginForeignInsert via ExecInitRoutingInfo before initializing
>> ri_returningList of ResultRelInfo for the partition, BeginForeignInsert
>> sees that the ri_returningList is NULL and incorrectly recognizes that
>> the local query doesn't have a RETURNING list.
> 
> Good catch!
> 
>> So, I moved the
>> ExecInitRoutingInfo call after building the ri_returningList (and before
>> handling ON CONFLICT because we reference the conversion map created by
>> that when handling that clause).  The first version of the patch called
>> BeginForeignInsert that order, but I updated the patch incorrectly.  :(
> 
> I didn't notice that when reviewing either.  Actually, I was under the
> impression that BeginForeignInsert consumes returningList from the
> ModifyTable node itself, not the ResultRelInfo, but now I see that
> ri_returningList was added exactly for BeginForeignInsert to consume.
> Good thing about that is that we get a Var-translated version instead of
> the original one that contains parent's attnos.
> 
>> Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
>> and added that to ExecInitPartitionInfo right after the>  InitResultRelInfo 
>> call, because it would be better to abort the
>> operation as soon as we find the partition to be non-routable.
> 
> Sounds fine.
> 
>> Another
>> thing I noticed is the RT index of the foreign partition set to 1 in
>> postgresBeginForeignInsert to create a remote query; that would not work
>> for cases where the partitioned table has an RT index larger than 1 (eg,
>> the partitioned table is not the primary ModifyTable node); in which
>> cases, since the varno of Vars belonging to the foreign partition in the
>> RETURNING expressions is the same as the partitioned table, calling
>> deparseReturningList with the RT index set to 1 against the RETURNING
>> expressions would produce attrs_used to be NULL, leading to postgres_fdw
>> not retrieving actually inserted data from the remote, even if remote
>> triggers might change those data.  So, I fixed this as well, by setting
>> the RT index accordingly to the partitioned table.
> 
> Check.
> 
>> Attached is a patch
>> for fixing these issues.  I'll add this to the open items list for PG11.
>>   (After addressing this, I'll post an updated version of the
>> fix-postgres_fdw-WCO-handling patch.)
> 
> Your patch seems to fix the issue and code changes seem fine to me.

Thanks for the review!

Best regards,
Etsuro Fujita



Re: Oddity in tuple routing for foreign partitions

2018-04-16 Thread Amit Langote
Fujita-san,

On 2018/04/16 20:25, Etsuro Fujita wrote:
> Hi,
> 
> While updating the fix-postgres_fdw-WCO-handling patch, I noticed that
> the patch for tuple routing for foreign partitions doesn't work well
> with remote triggers.  Here is an example:
> 
> postgres=# create table loct1 (a int check (a in (1)), b text);
> postgres=# create function br_insert_trigfunc() returns trigger as
> $$begin new.b := new.b || ' triggered !'; return new; end$$ language
> plpgsql;
> postgres=# create trigger loct1_br_insert_trigger before insert on loct1
> for each row execute procedure br_insert_trigfunc();
> postgres=# create table itrtest (a int, b text) partition by list (a);
> postgres=# create foreign table remp1 (a int check (a in (1)), b text)
> server loopback options (table_name 'loct1');
> postgres=# alter table itrtest attach partition remp1 for values in (1);
> 
> This behaves oddly:
> 
> postgres=# insert into itrtest values (1, 'foo') returning *;
>  a |  b
> ---+-
>  1 | foo
> (1 row)
> 
> INSERT 0 1
> 
> The new value of b in the RETURNING result should be concatenated with '
> triggered !', as shown below:
> 
> postgres=# select tableoid::regclass, * from itrtest ;
>  tableoid | a |b
> --+---+-
>  remp1| 1 | foo triggered !
> (1 row)
> 
> The reason for that is: since that in ExecInitPartitionInfo, the core
> calls BeginForeignInsert via ExecInitRoutingInfo before initializing
> ri_returningList of ResultRelInfo for the partition, BeginForeignInsert
> sees that the ri_returningList is NULL and incorrectly recognizes that
> the local query doesn't have a RETURNING list.

Good catch!

> So, I moved the
> ExecInitRoutingInfo call after building the ri_returningList (and before
> handling ON CONFLICT because we reference the conversion map created by
> that when handling that clause).  The first version of the patch called
> BeginForeignInsert that order, but I updated the patch incorrectly.  :(

I didn't notice that when reviewing either.  Actually, I was under the
impression that BeginForeignInsert consumes returningList from the
ModifyTable node itself, not the ResultRelInfo, but now I see that
ri_returningList was added exactly for BeginForeignInsert to consume.
Good thing about that is that we get a Var-translated version instead of
the original one that contains parent's attnos.

> Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
> and added that to ExecInitPartitionInfo right after the> InitResultRelInfo 
> call, because it would be better to abort the
> operation as soon as we find the partition to be non-routable.

Sounds fine.

> Another
> thing I noticed is the RT index of the foreign partition set to 1 in
> postgresBeginForeignInsert to create a remote query; that would not work
> for cases where the partitioned table has an RT index larger than 1 (eg,
> the partitioned table is not the primary ModifyTable node); in which
> cases, since the varno of Vars belonging to the foreign partition in the
> RETURNING expressions is the same as the partitioned table, calling
> deparseReturningList with the RT index set to 1 against the RETURNING
> expressions would produce attrs_used to be NULL, leading to postgres_fdw
> not retrieving actually inserted data from the remote, even if remote
> triggers might change those data.  So, I fixed this as well, by setting
> the RT index accordingly to the partitioned table.

Check.

> Attached is a patch
> for fixing these issues.  I'll add this to the open items list for PG11.
>  (After addressing this, I'll post an updated version of the
> fix-postgres_fdw-WCO-handling patch.)

Your patch seems to fix the issue and code changes seem fine to me.

Thanks,
Amit




Re: [HACKERS] Runtime Partition Pruning

2018-04-16 Thread David Rowley
On 14 April 2018 at 05:04, Robert Haas  wrote:
> But I wonder why it's the executor's job to clean up after the
> planner, instead of adjusting the relevant planner functions to avoid
> leaking memory?

It might be possible, but it might also be risky and difficult.

For a while, during my review of the faster partition pruning patch I
was asking Amit to add pfree() calls in various places for this exact
reason, but in the end, I gave up and decided it was easier to just
create a new memory context to call the planner function from. I've
now forgotten the exact reason why I finally decided it was too much
trouble.  The pruning code now works using your step logic so perhaps
that reason no longer applies, although, on a quick scan of the
pruning code now, it seems to require that get_matching_partitions
performs a deep pfree of each PruneStepResult. However, there is still
partkey_datum_from_expr which performs ExecInitExpr, although perhaps
that can just be done once, and the result stashed in the
PartitionPruneContext.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Proposal: Adding json logging

2018-04-16 Thread Peter Eisentraut
On 4/13/18 20:00, David Arnold wrote:
> I have reviewed some log samples and all DO contain some kind of multi
> line logs which are very uncomfortable to parse reliably in a log streamer.
> 
> I asked Michael Paquier about his
> solution: https://github.com/michaelpq/pg_plugins/tree/master/jsonlog
> He was suggestion to take action and propose this extension again to be
> included in contrib:
> https://github.com/michaelpq/pg_plugins/issues/24 

I have used https://github.com/mpihlak/pg_logforward in the past, which
seems to be about the same thing.

I have also had good success using syslog.  While syslog is not very
structured, the setting syslog_split_messages allows sending log entries
that include newlines in one piece, which works well if you have some
kind of full-text search engine at the receiving end.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Instability in partition_prune test?

2018-04-16 Thread David Rowley
On 17 April 2018 at 09:31, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> Seems reasonable.  I'm still uncomfortable with the assumption
>>> that if we ask for two workers we will get two workers, but
>>> that's a pre-existing problem in other parallel regression tests.
>
>> Yeah, I was looking at that line and wondering.  But I think that'd
>> require a different approach (*if* we see it fail, which I'm not sure we
>> have), such as suppressing the Workers Launched lines without a plpgsql
>> function to do it, since it's much more prevalent than this problem.
>
> At least in this case, some of the "row" counts also depend on number
> of workers, no?  So just hiding that line wouldn't do it.
>
> Anyway, I agree that we shouldn't solve that problem until we see
> that it's a problem in practice.

I agree. The solution to that problem, if it ever comes up may end up
just being to run the particular test in a parallel group with just
that test, or fewer tests.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Instability in partition_prune test?

2018-04-16 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Seems reasonable.  I'm still uncomfortable with the assumption
>> that if we ask for two workers we will get two workers, but
>> that's a pre-existing problem in other parallel regression tests.

> Yeah, I was looking at that line and wondering.  But I think that'd
> require a different approach (*if* we see it fail, which I'm not sure we
> have), such as suppressing the Workers Launched lines without a plpgsql
> function to do it, since it's much more prevalent than this problem.

At least in this case, some of the "row" counts also depend on number
of workers, no?  So just hiding that line wouldn't do it.

Anyway, I agree that we shouldn't solve that problem until we see
that it's a problem in practice.

regards, tom lane



Re: Instability in partition_prune test?

2018-04-16 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Yeah, loss of executor code coverage was what concerned me.
> 
> > Here's a proposed patch for this.
> 
> Seems reasonable.  I'm still uncomfortable with the assumption
> that if we ask for two workers we will get two workers, but
> that's a pre-existing problem in other parallel regression tests.

Yeah, I was looking at that line and wondering.  But I think that'd
require a different approach (*if* we see it fail, which I'm not sure we
have), such as suppressing the Workers Launched lines without a plpgsql
function to do it, since it's much more prevalent than this problem.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Instability in partition_prune test?

2018-04-16 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Yeah, loss of executor code coverage was what concerned me.

> Here's a proposed patch for this.

Seems reasonable.  I'm still uncomfortable with the assumption
that if we ask for two workers we will get two workers, but
that's a pre-existing problem in other parallel regression tests.

regards, tom lane



Re: Instability in partition_prune test?

2018-04-16 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> It seems quite silly to be asking for a parallel plan and then insisting
> >> it not run in parallel.
> 
> > Now that you mention it, this probably decreases coverage for the
> > choose_next_subplan_for_worker function.
> 
> Yeah, loss of executor code coverage was what concerned me.

Here's a proposed patch for this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit e8ec68c15788edd8dc01463fc2df9526faab4bdc
Author: Alvaro Herrera 
AuthorDate: Mon Apr 16 16:54:41 2018 -0300
CommitDate: Mon Apr 16 17:42:53 2018 -0300

Restore partition_prune's usage of parallel workers

This reverts commit 4d0f6d3f207d ("Attempt to stabilize partition_prune
test output (2)"), and attempts to stabilize the test by using string
replacement to hide any loop count difference in parallel nodes.

Discussion: https://postgr.es/m/4475.1523628...@sss.pgh.pa.us

diff --git a/src/test/regress/expected/partition_prune.out 
b/src/test/regress/expected/partition_prune.out
index 41d38d3d9a..49a46fa69f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -1748,16 +1748,34 @@ explain (analyze, costs off, summary off, timing off) 
execute ab_q3 (2, 2);
 (10 rows)
 
 -- Parallel append
+-- Suppress the number of loops each node runs for.  This is because workers
+-- may run a parallel node more than once, if timing conditions are just so,
+-- which destabilizes the test.
+create function explain_parallel_append(text, int[]) returns setof text
+language plpgsql as
+$$
+declare
+ln text;
+args text := string_agg(u::text, ', ') from unnest($2) u;
+begin
+for ln in
+execute format('explain (analyze, costs off, summary off, timing off) 
execute %s(%s)',
+$1, args)
+loop
+if ln like '%Parallel%' then
+ln := regexp_replace(ln, 'loops=\d*',  'loops=N');
+end if;
+return next ln;
+end loop;
+end;
+$$;
 prepare ab_q4 (int, int) as
 select avg(a) from ab where a between $1 and $2 and b < 4;
 -- Encourage use of parallel plans
 set parallel_setup_cost = 0;
 set parallel_tuple_cost = 0;
 set min_parallel_table_scan_size = 0;
--- set this so we get a parallel plan
 set max_parallel_workers_per_gather = 2;
--- and zero this so that workers don't destabilize the explain output
-set max_parallel_workers = 0;
 -- Execute query 5 times to allow choose_custom_plan
 -- to start considering a generic plan.
 execute ab_q4 (1, 8);
@@ -1790,21 +1808,21 @@ execute ab_q4 (1, 8);
 
 (1 row)
 
-explain (analyze, costs off, summary off, timing off) execute ab_q4 (2, 2);
-  QUERY PLAN   
+select explain_parallel_append('ab_q4', '{2, 2}');
+explain_parallel_append
 ---
  Finalize Aggregate (actual rows=1 loops=1)
-   ->  Gather (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
  Workers Planned: 2
- Workers Launched: 0
- ->  Partial Aggregate (actual rows=1 loops=1)
-   ->  Parallel Append (actual rows=0 loops=1)
+ Workers Launched: 2
+ ->  Partial Aggregate (actual rows=1 loops=3)
+   ->  Parallel Append (actual rows=0 loops=N)
  Subplans Removed: 6
- ->  Parallel Seq Scan on ab_a2_b1 (actual rows=0 loops=1)
+ ->  Parallel Seq Scan on ab_a2_b1 (actual rows=0 loops=N)
Filter: ((a >= $1) AND (a <= $2) AND (b < 4))
- ->  Parallel Seq Scan on ab_a2_b2 (actual rows=0 loops=1)
+ ->  Parallel Seq Scan on ab_a2_b2 (actual rows=0 loops=N)
Filter: ((a >= $1) AND (a <= $2) AND (b < 4))
- ->  Parallel Seq Scan on ab_a2_b3 (actual rows=0 loops=1)
+ ->  Parallel Seq Scan on ab_a2_b3 (actual rows=0 loops=N)
Filter: ((a >= $1) AND (a <= $2) AND (b < 4))
 (13 rows)
 
@@ -1843,59 +1861,59 @@ execute ab_q5 (1, 2, 3);
 
 (1 row)
 
-explain (analyze, costs off, summary off, timing off) execute ab_q5 (1, 1, 1);
-  QUERY PLAN   
+select explain_parallel_append('ab_q5', '{1, 1, 1}');
+explain_parallel_append
 ---
  Finalize Aggregate (actual rows=1 loops=1)
-   ->  Gather (actual rows=1 loops=1)
+   ->  Gather (actual rows=3 loops=1)
  Workers Planned: 2
- Workers Launched: 0
- ->  Partial Aggregate (actual 

Re: very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Guilherme Pereira
Some extra info, which might help, increasing the cost of
the parallel_setup_cost to a value of 4500, Postgres doesn't choose the
parallel query anymore, making the query faster again.

db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=# set parallel_setup_cost = 4500;
SET
db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=# explain analyze SELECT count(*)
FROM f_ticketupdate_aad5jtwal0ayaax AS f
INNER JOIN
  dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
  ON (f.dt_event_id = d.id)
WHERE ( 6171 = d."id_euweek" );

 QUERY PLAN
-
 Aggregate  (cost=1508646.93..1508646.94 rows=1 width=8) (actual
time=0.067..0.067 rows=1 loops=1)
   ->  Nested Loop  (cost=1638.24..1508474.08 rows=69140 width=0) (actual
time=0.064..0.064 rows=0 loops=1)
 ->  Bitmap Heap Scan on
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z d
(cost=4.34..27.35 rows=7 width=4) (actual time=0.016..0.040 rows=7 loops=1)
   Recheck Cond: (6171 = id_euweek)
   Heap Blocks: exact=7
   ->  Bitmap Index Scan on
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_a_id_euweek_idx
(cost=0.00..4.33 rows=7 width=0) (actual time=0.010..0.010 rows=7 loops=1)
 Index Cond: (6171 = id_euweek)
 ->  Bitmap Heap Scan on f_ticketupdate_aad5jtwal0ayaax f
(cost=1633.90..214617.67 rows=87472 width=4) (actual time=0.002..0.002
rows=0 loops=7)
   Recheck Cond: (dt_event_id = d.id)
   ->  Bitmap Index Scan on
f_ticketupdate_aad5jtwal0ayaax_dt_event_id_idx  (cost=0.00..1612.03
rows=87472 width=0) (actual time=0.002..0.002 rows=0 loops=7)
 Index Cond: (dt_event_id = d.id)
 Planning time: 0.528 ms
 Execution time: 0.144 ms

On Mon, 16 Apr 2018 at 19:16 Guilherme Pereira <
guilherme.pere...@gooddata.com> wrote:

> Hope it's fine to jump in.
>
> db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=#  set
> max_parallel_workers_per_gather=0;
> SET
> db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=# explain analyze SELECT count(*)
> FROM f_ticketupdate_aad5jtwal0ayaax AS f
> INNER JOIN
>   dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
>   ON (f.dt_event_id = d.id)
> WHERE ( 6171 = d."id_euweek" );
>
>QUERY PLAN
>
> -
>  Aggregate  (cost=1508646.93..1508646.94 rows=1 width=8) (actual
> time=0.145..0.145 rows=1 loops=1)
>->  Nested Loop  (cost=1638.24..1508474.08 rows=69140 width=0) (actual
> time=0.142..0.142 rows=0 loops=1)
>  ->  Bitmap Heap Scan on
> dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z d
> (cost=4.34..27.35 rows=7 width=4) (actual time=0.043..0.103 rows=7 loops=1)
>Recheck Cond: (6171 = id_euweek)
>Heap Blocks: exact=7
>->  Bitmap Index Scan on
> dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_a_id_euweek_idx
> (cost=0.00..4.33 rows=7 width=0) (actual time=0.036..0.036 rows=7 loops=1)
>  Index Cond: (6171 = id_euweek)
>  ->  Bitmap Heap Scan on f_ticketupdate_aad5jtwal0ayaax f
> (cost=1633.90..214617.67 rows=87472 width=4) (actual time=0.003..0.003
> rows=0 loops=7)
>Recheck Cond: (dt_event_id = d.id)
>->  Bitmap Index Scan on
> f_ticketupdate_aad5jtwal0ayaax_dt_event_id_idx  (cost=0.00..1612.03
> rows=87472 width=0) (actual time=0.003..0.003 rows=0 loops=7)
>  Index Cond: (dt_event_id = d.id)
>  Planning time: 0.496 ms
>  Execution time: 0.227 ms
> (13 rows)
>
> db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=#  set
> max_parallel_workers_per_gather=2;
> SET
> db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=# explain analyze SELECT count(*)
> FROM f_ticketupdate_aad5jtwal0ayaax AS f
> INNER JOIN
>   dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
>   ON (f.dt_event_id = d.id)
> WHERE ( 6171 = d."id_euweek" );
>
> QUERY PLAN
>
> ---
>  Finalize Aggregate  (cost=1490623.06..1490623.07 rows=1 width=8) (actual
> time=9604.745..9604.745 rows=1 loops=1)
>->  Gather  (cost=1490622.85..1490623.06 rows=2 width=8) (actual
> time=9604.707..9604.739 rows=3 loops=1)
>  Workers Planned: 2
>  Workers Launched: 2
>  ->  Partial Aggregate  (cost=1489622.85..1489622.86 rows=1
> width=8) (actual time=9600.255..9600.255 rows=1 loops=3)
>->  Hash Join  (cost=27.44..1489550.83 rows=28808 width=0)
> (actual time=9600.249..9600.249 rows=0 loops=3)
>  Hash Cond: 

Re: very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Tomas Vondra
Thanks. So we now have a trivial query demonstrating the issue. IMHO
this is not really a costing issue, but due to a misestimate.

Essentially, the problem is that the two sides of the join mismatch,
causing this:

->  Bitmap Heap Scan on dwh_dm ... d  (... rows=7 width=4) (...)

->  Bitmap Heap Scan on f_ticketupdate_aad5jtwal0ayaax f
  (cost=1633.90..214617.67 rows=87472 width=4)
  (actual time=0.003..0.003 rows=0 loops=7)
Recheck Cond: (dt_event_id = d.id)

->  Bitmap Index Scan on f_ticketupdate_aad5jtwal0ayaa ...
  (cost=0.00..1612.03 rows=87472 width=0)
  (actual time=0.003..0.003 rows=0 loops=7)
Index Cond: (dt_event_id = d.id)

I.e. the database believes the bitmap index scan will match 87k rows.
But in fact it matches 0, which makes the bitmap heap scan entirely
unnecessary (thus costing nothing, because it's skipped).

Of course, the parallel plan is structured slightly differently, and
does not allow this skipping because it places the f_ table on the outer
side of the join (and scans it using sequential scan).

Now, try changing the parameters (particularly id_euweek) so that the
bitmap index scan actually matches something. I'm pretty sure that will
make the non-parallel case much more expensive.

Increasing the parallel_setup_cost makes the parallel plan a bit more
expensive, enough to switch to the non-parallel plan. But that's mostly
a fluke and not particularly principled way to fix this - if the cost
difference gets a bit larger (or if you increase the number of parallel
workers) it's probably going to use the parallel plan again.

Obviously, PostgreSQL 9.5 doesn't have parallel queries, so it does not
have a chance of making this mistake.

regards
-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-16 Thread Tom Lane
Alvaro Herrera  writes:
> Pushed now, thanks.

Buildfarm doesn't like this even a little bit.

regards, tom lane



Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-16 Thread Alvaro Herrera
Amit Langote wrote:

> The solution I came up with is to call map_variable_attnos() directly,
> instead of going through map_partition_varattnos() every time, after first
> creating the attribute map ourselves.

Yeah, sounds good.  I added a tweak: if the tupledescs are equal, there
should be no need to do any mapping.

(Minor adjustment to the test: "cuarenta" means forty, so I changed the
new test to say "cincuenta" instead, which means fifty).

Pushed now, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Proposal: Adding json logging

2018-04-16 Thread David Arnold
> In CSV a line break inside a field is easy to process for
a parser, because (per https://tools.ietf.org/html/rfc4180):
>"Fields containing line breaks (CRLF), double quotes, and commas should be
enclosed in double-quotes"

Interesting, does that implicitly mean the whole log event would get
transmitted as a "line" (with CRLF) in CSV. Nor do I know how to confirm of
falsify that...

In the affirmative scenario, this then would work for a true streaming
aggregator (if CSV was supported).

Wouldn't it continue to be resource expensive if for some reason someone
wants to continue to do log tailing out of a log file, which as I
understand it is lined by line semantics?

I do not plan to do that, but there are people who prefer to also have a
local copy of their logfile just in case.

But if CSV, as emitted by postgres, would be supported by fluentbit I would
be equally happy with that solution.

The second order problem, after all is just that: of second order...

El lun., 16 abr. 2018, 1:28 p.m., Daniel Verite 
escribió:

> David Arnold wrote:
>
> > Not claiming this assumption does imply parsing of a *rolling* set
> > of log lines with *previously unkown cardinality*. That's expensive
> > on computing resources. I don't have actual numbers, but it doesn't
> > seem too far fetched, neither.
> > I filed a question to the author of fluent-bit to that extend which
> > you can consult here:
> > https://github.com/fluent/fluent-bit/issues/564 Let's see what
> > Eduardo has to inform us about this...
>
> fluent-bit does not appear to support CSV, as mentioned in
> https://github.com/fluent/fluent-bit/issues/459
> which got flagged as an enhancement request some time ago.
>
> In CSV a line break inside a field is easy to process for
> a parser, because (per https://tools.ietf.org/html/rfc4180):
>
>   "Fields containing line breaks (CRLF), double quotes, and commas
> should be enclosed in double-quotes"
>
> So there is no look ahead to do. In a character-by-character loop,
> when encountering a line break, either the current field did not
> start with a double quote and the line break is part of the content, or it
> did
> start with a double quote and the line break ends the current record.
>
> What doesn't quite work is to parse CSV with a regex, it's
> discussed in some detail here for instance:
>
> https://softwareengineering.stackexchange.com/questions/166454/can-the-csv-format-be-defined-by-a-regex
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>


Re: Proposal: Adding json logging

2018-04-16 Thread Daniel Verite
David Arnold wrote:

> Not claiming this assumption does imply parsing of a *rolling* set
> of log lines with *previously unkown cardinality*. That's expensive
> on computing resources. I don't have actual numbers, but it doesn't
> seem too far fetched, neither. 
> I filed a question to the author of fluent-bit to that extend which
> you can consult here:
> https://github.com/fluent/fluent-bit/issues/564 Let's see what
> Eduardo has to inform us about this...

fluent-bit does not appear to support CSV, as mentioned in
https://github.com/fluent/fluent-bit/issues/459
which got flagged as an enhancement request some time ago.

In CSV a line break inside a field is easy to process for
a parser, because (per https://tools.ietf.org/html/rfc4180):

  "Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes"

So there is no look ahead to do. In a character-by-character loop,
when encountering a line break, either the current field did not
start with a double quote and the line break is part of the content, or it
did
start with a double quote and the line break ends the current record.

What doesn't quite work is to parse CSV with a regex, it's
discussed in some detail here for instance:
https://softwareengineering.stackexchange.com/questions/166454/can-the-csv-format-be-defined-by-a-regex


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Guilherme Pereira
Hope it's fine to jump in.

db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=#  set
max_parallel_workers_per_gather=0;
SET
db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=# explain analyze SELECT count(*)
FROM f_ticketupdate_aad5jtwal0ayaax AS f
INNER JOIN
  dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
  ON (f.dt_event_id = d.id)
WHERE ( 6171 = d."id_euweek" );

 QUERY PLAN
-
 Aggregate  (cost=1508646.93..1508646.94 rows=1 width=8) (actual
time=0.145..0.145 rows=1 loops=1)
   ->  Nested Loop  (cost=1638.24..1508474.08 rows=69140 width=0) (actual
time=0.142..0.142 rows=0 loops=1)
 ->  Bitmap Heap Scan on
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z d
(cost=4.34..27.35 rows=7 width=4) (actual time=0.043..0.103 rows=7 loops=1)
   Recheck Cond: (6171 = id_euweek)
   Heap Blocks: exact=7
   ->  Bitmap Index Scan on
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_a_id_euweek_idx
(cost=0.00..4.33 rows=7 width=0) (actual time=0.036..0.036 rows=7 loops=1)
 Index Cond: (6171 = id_euweek)
 ->  Bitmap Heap Scan on f_ticketupdate_aad5jtwal0ayaax f
(cost=1633.90..214617.67 rows=87472 width=4) (actual time=0.003..0.003
rows=0 loops=7)
   Recheck Cond: (dt_event_id = d.id)
   ->  Bitmap Index Scan on
f_ticketupdate_aad5jtwal0ayaax_dt_event_id_idx  (cost=0.00..1612.03
rows=87472 width=0) (actual time=0.003..0.003 rows=0 loops=7)
 Index Cond: (dt_event_id = d.id)
 Planning time: 0.496 ms
 Execution time: 0.227 ms
(13 rows)

db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=#  set
max_parallel_workers_per_gather=2;
SET
db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=# explain analyze SELECT count(*)
FROM f_ticketupdate_aad5jtwal0ayaax AS f
INNER JOIN
  dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
  ON (f.dt_event_id = d.id)
WHERE ( 6171 = d."id_euweek" );

  QUERY PLAN
---
 Finalize Aggregate  (cost=1490623.06..1490623.07 rows=1 width=8) (actual
time=9604.745..9604.745 rows=1 loops=1)
   ->  Gather  (cost=1490622.85..1490623.06 rows=2 width=8) (actual
time=9604.707..9604.739 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Partial Aggregate  (cost=1489622.85..1489622.86 rows=1
width=8) (actual time=9600.255..9600.255 rows=1 loops=3)
   ->  Hash Join  (cost=27.44..1489550.83 rows=28808 width=0)
(actual time=9600.249..9600.249 rows=0 loops=3)
 Hash Cond: (f.dt_event_id = d.id)
 ->  Parallel Seq Scan on
f_ticketupdate_aad5jtwal0ayaax f  (cost=0.00..1185867.47 rows=24054847
width=4) (actual time=0.076..4955.525 rows=19243863 loops=3)
 ->  Hash  (cost=27.35..27.35 rows=7 width=4) (actual
time=0.099..0.099 rows=7 loops=3)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  Bitmap Heap Scan on
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z d
(cost=4.34..27.35 rows=7 width=4) (actual time=0.045..0.085 rows=7 loops=3)
 Recheck Cond: (6171 = id_euweek)
 Heap Blocks: exact=7
 ->  Bitmap Index Scan on
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_a_id_euweek_idx
(cost=0.00..4.33 rows=7 width=0) (actual time=0.032..0.032 rows=7 loops=3)
   Index Cond: (6171 = id_euweek)
 Planning time: 0.616 ms
 Execution time: 9611.924 ms
(17 rows)

On Mon, Apr 16, 2018 at 4:53 PM Pavel Stehule 
wrote:

>
> -- Forwarded message -
> From: Tomas Vondra 
> Date: po 16. 4. 2018 16:14
> Subject: Re: very slow queries when max_parallel_workers_per_gather is
> higher than zero
> To: Pavel Stehule 
> Cc: PostgreSQL Hackers 
>
>
> Apologies, the reduced query was missing a where condition on id_week:
>
> SELECT count(*)
> FROM f_ticketupdate_aad5jtwal0ayaax AS f
> INNER JOIN
>   dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
>   ON (f.dt_event_id = d.id)
> WHERE ( 6171 = d."id_euweek" )
>
>
> regards
>
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Overcoming SELECT ... FOR UPDATE permission restrictions

2018-04-16 Thread Alexander Lakhin

13.04.2018 18:55, Tom Lane wrote:

Although this is arguably a security bug, I'm not sure we should
back-patch it.  The consequences seem relatively minor, and the
behavioral change carries a significant risk of breaking applications
that worked as-intended up to now.  Thoughts?

The worst scenario (with the current system views) I could think of is:
user=> CREATE VIEW pgg AS SELECT * FROM pg_group;
BEGIN TRANSACTION; SELECT * FROM pgg FOR UPDATE; SELECT pg_sleep(60); 
ROLLBACK;

and the parallel operation:
admin=> DROP ROLE testrole;
hangs for one minute.
But admin can observer the locks and kill the offending backend so it's 
hardly a critical issue.





Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2018-04-16 Thread Liudmila Mantrova

Hi everyone,

When translating doc updates, Alexander Lakhin noticed that trigram 
examples were not quite accurate.

A small patch fixing this issue is attached.


On 03/21/2018 03:35 PM, Teodor Sigaev wrote:

Thank you, pushed

David Steele wrote:

On 3/6/18 7:04 AM, Teodor Sigaev wrote:

I agree with Teodor (upthread, not quoted here) that the documentation
could use some editing.

I started to do it myself, but quickly realized I have no knowledge of
the content.  I'm afraid I would destroy the meaning while updating 
the

grammar.

Anyone understand the subject matter well enough to review the
documentation?


Liudmila tried to improve docs in Alexander's patchset.

https://www.postgresql.org/message-id/f43b242d-000c-f4c8-cb8b-d37e9752c...@postgrespro.ru 



This looks good to me with a few minor exceptions:

+   word_similarity(text, text) requires further
+   explanation. Consider the following example:

Maybe too verbose?  I think "word_similarity(text,
text) requires further explanation." can be removed entirely.

+   string.  However, this function does not add paddings to the

"add padding"


BTW, adding Liudmila's message to commitfest task
(https://commitfest.postgresql.org/17/1403/) doesn't work


Doesn't work for me either.

Alexander, can you post the final patches to the thread so they show up
in the CF app?

Thanks,





--
Liudmila Mantrova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 8f39529..be43cdf 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -152,9 +152,9 @@
 
 
In the first string, the set of trigrams is
-   {"  w"," wo","ord","wor","rd "}.
+   {"  w"," wo","wor","ord","rd "}.
In the second string, the ordered set of trigrams is
-   {"  t"," tw",two,"wo ","  w"," wo","wor","ord","rds", ds "}.
+   {"  t"," tw","two","wo ","  w"," wo","wor","ord","rds","ds "}.
The most similar extent of an ordered set of trigrams in the second string
is {"  w"," wo","wor","ord"}, and the similarity is
0.8.
@@ -172,7 +172,7 @@
At the same time, strict_word_similarity(text, text)
has to select an extent that matches word boundaries.  In the example above,
strict_word_similarity(text, text) would select the
-   extent {"  w"," wo","wor","ord","rds", ds "}, which
+   extent {"  w"," wo","wor","ord","rds","ds "}, which
corresponds to the whole word 'words'.
 
 


Re: [patch] pg_attribute.attndims turns to 0 when 'create table like/as'

2018-04-16 Thread Alexander Kuzmenkov

On 04/16/2018 05:01 PM, Alexey Bashtanov wrote:
As reported in [1], pg_attribute.attndims turns to 0 when 'create 
table like/as'.

The patch attached is to fix it.



Hi Alexey,

Judging from the discussion in [1], attndims is deprecated. Could you 
describe what you are trying to achieve with it?


[1] https://www.postgresql.org/message-id/3792.1485959113%40sss.pgh.pa.us

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Fix slot's xmin advancement and subxact's lost snapshots in decoding.

2018-04-16 Thread Arseny Sher
(delicate ping)



Re: Proposal: Adding json logging

2018-04-16 Thread David Arnold
*Hi all,*

This discussion has made big steps forward. It is very encouraging to see
this amount of interest. It seems that this has been around at the back of
many minds for some time already...

Thanks to Chrisophe friendly reminder, I aim to try to define the problem
space as concise as possible.
I think what we have left to clarify is the root reason of the observation
that logs streaming/processing tools do not embrace multi lines.
Then, hopefully, a final decision can be obtained on whether this is a
admissible problem worth addressing or not from postgres side.

In case it is, it can be additionally decided, whether it's also an UX that
should be improved (-> JSON/logfmt) by occasion of this opportunity.

Let's hope Eduardo, the maker of fluent-bit finds time soon to tell us what
he has to say about the multi line problem in log parsing.

*Best, *David

El lun., 16 abr. 2018 a las 9:41, David Fetter ()
escribió:

> On Mon, Apr 16, 2018 at 10:06:29AM -0400, Andrew Dunstan wrote:
> > On 04/15/2018 05:05 PM, Christophe Pettus wrote:
> > >> On Apr 15, 2018, at 12:16, David Arnold  wrote:
> > >>
> > >> Core-Problem: "Multi line logs are unnecessarily inconvenient to
> parse and are not compatible with the design of some (commonly used)
> logging aggregation flows."
> > > I'd argue that the first line of attack on that should be to explain
> to those consumers of logs that they are making some unwarranted
> assumptions about the kind of inputs they'll be seeing.  PostgreSQL's CSV
> log formats are not a particular bizarre format, or very difficult to
> parse.  The standard Python CSV library handles them just file, for
> example.  You have to handle newlines that are part of a log message
> somehow; a newline in a PostgreSQL query, for example, needs to be emitted
> to the logs.
> >
> >
> > In JSON newlines would have to be escaped, since literal newlines are
> > not legal in JSON strings. Postgres' own CSV parser has no difficulty at
> > all in handling newlines embedded in the fields of CSV logs.
>
> True, and anything that malloc()s in the process of doing that
> escaping could fail on OOM, and hilarity would ensue. I don't see
> these as show-stoppers, or even as super relevant to the vast majority
> of users. If you're that close to the edge, you were going to crash
> anyhow.
>
> > I'm not necessarily opposed to providing for JSON logs, but the
> > overhead of named keys could get substantial. Abbreviated keys might
> > help, but generally I think I would want to put such logs on a
> > compressed ZFS drive or some such.
>
> Frequently at places I've worked, the end destination is of less
> concern immediate than the ability to process those logs for
> near-real-time monitoring.  This is where formats like JSON really
> shine.
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778 <(415)%20235-3778>
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
-- 
[image: XOE Solutions]  DAVID ARNOLD
Gerente General
xoe.solutions
dar@xoe.solutions
+57 (315) 304 13 68
*Confidentiality Note: * This email may contain confidential and/or private
information. If you received this email in error please delete and notify
sender.
*Environmental Consideration: * Please avoid printing this email on paper,
unless really necessary.


Re: Proposal: Adding json logging

2018-04-16 Thread David Fetter
On Mon, Apr 16, 2018 at 10:06:29AM -0400, Andrew Dunstan wrote:
> On 04/15/2018 05:05 PM, Christophe Pettus wrote:
> >> On Apr 15, 2018, at 12:16, David Arnold  wrote:
> >>
> >> Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and 
> >> are not compatible with the design of some (commonly used) logging 
> >> aggregation flows."
> > I'd argue that the first line of attack on that should be to explain to 
> > those consumers of logs that they are making some unwarranted assumptions 
> > about the kind of inputs they'll be seeing.  PostgreSQL's CSV log formats 
> > are not a particular bizarre format, or very difficult to parse.  The 
> > standard Python CSV library handles them just file, for example.  You have 
> > to handle newlines that are part of a log message somehow; a newline in a 
> > PostgreSQL query, for example, needs to be emitted to the logs.
> 
> 
> In JSON newlines would have to be escaped, since literal newlines are
> not legal in JSON strings. Postgres' own CSV parser has no difficulty at
> all in handling newlines embedded in the fields of CSV logs.

True, and anything that malloc()s in the process of doing that
escaping could fail on OOM, and hilarity would ensue. I don't see
these as show-stoppers, or even as super relevant to the vast majority
of users. If you're that close to the edge, you were going to crash
anyhow.

> I'm not necessarily opposed to providing for JSON logs, but the
> overhead of named keys could get substantial. Abbreviated keys might
> help, but generally I think I would want to put such logs on a
> compressed ZFS drive or some such.

Frequently at places I've worked, the end destination is of less
concern immediate than the ability to process those logs for
near-real-time monitoring.  This is where formats like JSON really
shine.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Deadlock in multiple CIC.

2018-04-16 Thread Tom Lane
I wrote:
> As an investigative measure, I propose that we insert
>   Assert(MyPgXact->xmin == InvalidTransactionId);
> into 9.4's DefineIndex, just after its InvalidateCatalogSnapshot call.

Well, isn't this interesting:

TRAP: FailedAssertion("!(MyPgXact->xmin == ((TransactionId) 0))", File: 
"indexcmds.c", Line: 777)
2018-04-16 02:41:25.814 PDT [5ad46fbd.5412:2] LOG:  server process (PID 21777) 
was terminated by signal 6: Aborted
2018-04-16 02:41:25.814 PDT [5ad46fbd.5412:3] DETAIL:  Failed process was 
running: CREATE INDEX CONCURRENTLY concur_index1 ON concur_heap(f2,f1);

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=okapi=2018-04-16%2009%3A35%3A02

So we can now refine the problem statement to "SnapshotResetXmin isn't
doing what it's supposed to".  No idea why yet.  9.4 is using a simple
RegisteredSnapshots counter which 9.5 has replaced with a pairing heap,
so you'd think the newer code would be *more* likely to have bugs...

regards, tom lane



Re: very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Tomas Vondra
Apologies, the reduced query was missing a where condition on id_week:

SELECT count(*)
FROM f_ticketupdate_aad5jtwal0ayaax AS f
INNER JOIN
  dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
  ON (f.dt_event_id = d.id)
WHERE ( 6171 = d."id_euweek" )


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



[patch] pg_attribute.attndims turns to 0 when 'create table like/as'

2018-04-16 Thread Alexey Bashtanov
As reported in [1], pg_attribute.attndims turns to 0 when 'create table 
like/as'.

The patch attached is to fix it.

Best Regards,
  Alexey

[1] 
https://www.postgresql.org/message-id/20150707072942.1186.98151%40wrigleys.postgresql.org
diff --git a/src/backend/nodes/makefuncs.c b/src/backend/nodes/makefuncs.c
new file mode 100644
index 1bd2599..8fff3d2
*** a/src/backend/nodes/makefuncs.c
--- b/src/backend/nodes/makefuncs.c
*** makeTypeNameFromOid(Oid typeOid, int32 t
*** 475,480 
--- 475,498 
  	n->typeOid = typeOid;
  	n->typemod = typmod;
  	n->location = -1;
+ 
+ 	return n;
+ }
+ 
+ /*
+  * makeTypeNameFromOid -
+  *	build a TypeName node to represent a type already known by OID/typmod/ndims
+  */
+ TypeName *
+ makeTypeNameWithNdimsFromOid(Oid typeOid, int32 typmod, int32 ndims)
+ {
+ 	int i;
+ 	TypeName   *n = makeTypeNameFromOid(typeOid, typmod);
+ 
+ 	n->arrayBounds = NIL;
+ 	for (i = 0; i < ndims; i++)
+ 		n->arrayBounds = lcons_int(-1, n->arrayBounds);
+ 
  	return n;
  }
  
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
new file mode 100644
index c6f3628..3eeb5cd
*** a/src/backend/parser/parse_utilcmd.c
--- b/src/backend/parser/parse_utilcmd.c
*** transformTableLikeClause(CreateStmtConte
*** 981,988 
  		 */
  		def = makeNode(ColumnDef);
  		def->colname = pstrdup(attributeName);
! 		def->typeName = makeTypeNameFromOid(attribute->atttypid,
! 			attribute->atttypmod);
  		def->inhcount = 0;
  		def->is_local = true;
  		def->is_not_null = attribute->attnotnull;
--- 981,989 
  		 */
  		def = makeNode(ColumnDef);
  		def->colname = pstrdup(attributeName);
! 		def->typeName = makeTypeNameWithNdimsFromOid(attribute->atttypid,
! 	 attribute->atttypmod,
! 	 attribute->attndims);
  		def->inhcount = 0;
  		def->is_local = true;
  		def->is_not_null = attribute->attnotnull;
diff --git a/src/include/nodes/makefuncs.h b/src/include/nodes/makefuncs.h
new file mode 100644
index 57bd52f..8db4b10
*** a/src/include/nodes/makefuncs.h
--- b/src/include/nodes/makefuncs.h
*** extern RangeVar *makeRangeVar(char *sche
*** 71,76 
--- 71,77 
  extern TypeName *makeTypeName(char *typnam);
  extern TypeName *makeTypeNameFromNameList(List *names);
  extern TypeName *makeTypeNameFromOid(Oid typeOid, int32 typmod);
+ extern TypeName *makeTypeNameWithNdimsFromOid(Oid typeOid, int32 typmod, int32 ndims);
  
  extern ColumnDef *makeColumnDef(const char *colname,
  			  Oid typeOid, int32 typmod, Oid collOid);


Re: Proposal: Adding json logging

2018-04-16 Thread Andrew Dunstan


On 04/15/2018 05:05 PM, Christophe Pettus wrote:
>> On Apr 15, 2018, at 12:16, David Arnold  wrote:
>>
>> Core-Problem: "Multi line logs are unnecessarily inconvenient to parse and 
>> are not compatible with the design of some (commonly used) logging 
>> aggregation flows."
> I'd argue that the first line of attack on that should be to explain to those 
> consumers of logs that they are making some unwarranted assumptions about the 
> kind of inputs they'll be seeing.  PostgreSQL's CSV log formats are not a 
> particular bizarre format, or very difficult to parse.  The standard Python 
> CSV library handles them just file, for example.  You have to handle newlines 
> that are part of a log message somehow; a newline in a PostgreSQL query, for 
> example, needs to be emitted to the logs.
>


In JSON newlines would have to be escaped, since literal newlines are
not legal in JSON strings. Postgres' own CSV parser has no difficulty at
all in handling newlines embedded in the fields of CSV logs.

I'm not necessarily opposed to providing for JSON logs, but the overhead
of named keys could get substantial. Abbreviated keys might help, but
generally I think I would want to put such logs on a compressed ZFS
drive or some such.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Pavel Stehule
2018-04-16 15:52 GMT+02:00 Tomas Vondra :

> > Query Performs nicely, but no parallel workers are used:
> > GroupAggregate  (cost=2611148.87..2611152.89 rows=31 width=22) (actual
> > time=0.084..0.084 rows=0 loops=1)
> >Group Key:
> > f_zendesktickets_aaeljtllr5at3el.cstm_custom_38746665_primary_column
> >->  Sort  (cost=2611148.87..2611149.11 rows=99 width=28) (actual
> > time=0.082..0.082 rows=0 loops=1)
> >  Sort Key:
> > f_zendesktickets_aaeljtllr5at3el.cstm_custom_38746665_primary_column
> >  Sort Method: quicksort  Memory: 25kB
> >  ->  Nested Loop  (cost=1639.25..2611145.59 rows=99 width=28)
> > (actual time=0.076..0.076 rows=0 loops=1)
> >Join Filter:
> > (((f_ticketattributeshistory_aajzjp98uraszb6.attrnewvalue_id = ANY
> > ('{4757,4758,4759}'::integer[])) AND (4754 =
> > f_ticketattributeshistory_aajzjp98uraszb6.attroldvalue_id) AND (4790 =
> > f_ticketattributeshistory_aajzjp98uraszb6.ticketfield_id)) OR
> > (f_zendesktickets_aaeljtllr5at3el.dt_createda
> > t_id = f_ticketupdate_aad5jtwal0ayaax.dt_event_id))
> >->  Nested Loop  (cost=1638.81..1809540.39 rows=350270
> > width=20) (actual time=0.075..0.075 rows=0 loops=1)
> >  ->  Nested Loop  (cost=1638.24..1508474.08
> > rows=69140 width=8) (actual time=0.075..0.075 rows=0 loops=1)
> >->  Bitmap Heap Scan on
> > dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z
> > dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia
> > (cost=4.34..27.35 rows=7 width=4) (actual time=0.026..0.038 rows=7
> loops=1)
> >  Recheck Cond: (6171 = id_euweek)
> >  Heap Blocks: exact=7
> >  ->  Bitmap Index Scan on
> > dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_a_id_euweek_idx
> > (cost=0.00..4.33 rows=7 width=0) (actual time=0.019..0.019 rows=7
> loops=1)
> >Index Cond: (6171 = id_euweek)
> >->  Bitmap Heap Scan on
> > f_ticketupdate_aad5jtwal0ayaax  (cost=1633.90..214617.67 rows=87472
> > width=8) (actual time=0.004..0.004 rows=0 loops=7)
> >  Recheck Cond: (dt_event_id =
> > dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id
> >  >)
> >  ->  Bitmap Index Scan on
> > f_ticketupdate_aad5jtwal0ayaax_dt_event_id_idx  (cost=0.00..1612.03
> > rows=87472 width=0) (actual time=0.003..0.003 rows=0 loops=7)
> >Index Cond: (dt_event_id =
> > dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id
> >  >)
> >  ->  Index Scan using
> > f_ticketattributeshistory_aajzjp98uraszb6_ticketupdate_id_idx on
> > f_ticketattributeshistory_aajzjp98uraszb6  (cost=0.57..4.12 rows=23
> > width=20) (never executed)
> >Index Cond: (ticketupdate_id =
> > f_ticketupdate_aad5jtwal0ayaax.id
> > )
> >->  Index Scan using
> > f_zendesktickets_aaeljtllr5at3el_pkey on
> > f_zendesktickets_aaeljtllr5at3el  (cost=0.43..2.27 rows=1 width=12)
> > (never executed)
> >  Index Cond: (id =
> > f_ticketattributeshistory_aajzjp98uraszb6.zendesktickets_id)
> >  Filter: ((4765 <> status_id) AND (group_id = 17429))
> >  Planning time: 8.516 ms
> >  Execution time: 1.895 ms
> >
> > the speed is back
> >
>
> Yeah, but the cost is higher (2611152 vs. 1949508). So clearly, the
> database believes it's going to be cheaper. I suspect a part of the
> issue might be that the join is misestimated - it's expected to produce
> ~29k rows, but produces 0.
>
> Can you check if this query has the same issue? It's just the
> problematic join, and it should be simpler to investigate:
>
> SELECT count(*)
>   FROM f_ticketupdate_aad5jtwal0ayaax AS f
>   INNER JOIN
>   dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
>   ON (f.dt_event_id = d.id)
>
>

db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=#  set  max_parallel_workers_per_
gather=2;
SET
db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw=# explain analyze SELECT count(*)
db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw-#   FROM f_ticketupdate_aad5jtwal0ayaax
AS f
db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw-#   INNER JOIN
db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw-#   dwh_dm_aabv5kk9rxac4lz_
aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
db_sq3rjf5b7p309lq9wuqrh3qhk4gy9fbw-#   ON (f.dt_event_id = d.id);

QUERY PLAN


--
 Finalize Aggregate  (cost=1550912.23..1550912.24 

Re: very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Tomas Vondra
> Query Performs nicely, but no parallel workers are used:
> GroupAggregate  (cost=2611148.87..2611152.89 rows=31 width=22) (actual
> time=0.084..0.084 rows=0 loops=1)
>    Group Key:
> f_zendesktickets_aaeljtllr5at3el.cstm_custom_38746665_primary_column
>    ->  Sort  (cost=2611148.87..2611149.11 rows=99 width=28) (actual
> time=0.082..0.082 rows=0 loops=1)
>          Sort Key:
> f_zendesktickets_aaeljtllr5at3el.cstm_custom_38746665_primary_column
>          Sort Method: quicksort  Memory: 25kB
>          ->  Nested Loop  (cost=1639.25..2611145.59 rows=99 width=28)
> (actual time=0.076..0.076 rows=0 loops=1)
>                Join Filter:
> (((f_ticketattributeshistory_aajzjp98uraszb6.attrnewvalue_id = ANY
> ('{4757,4758,4759}'::integer[])) AND (4754 =
> f_ticketattributeshistory_aajzjp98uraszb6.attroldvalue_id) AND (4790 =
> f_ticketattributeshistory_aajzjp98uraszb6.ticketfield_id)) OR
> (f_zendesktickets_aaeljtllr5at3el.dt_createda
> t_id = f_ticketupdate_aad5jtwal0ayaax.dt_event_id))
>                ->  Nested Loop  (cost=1638.81..1809540.39 rows=350270
> width=20) (actual time=0.075..0.075 rows=0 loops=1)
>                      ->  Nested Loop  (cost=1638.24..1508474.08
> rows=69140 width=8) (actual time=0.075..0.075 rows=0 loops=1)
>                            ->  Bitmap Heap Scan on
> dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z
> dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia 
> (cost=4.34..27.35 rows=7 width=4) (actual time=0.026..0.038 rows=7 loops=1)
>                                  Recheck Cond: (6171 = id_euweek)
>                                  Heap Blocks: exact=7
>                                  ->  Bitmap Index Scan on
> dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_a_id_euweek_idx 
> (cost=0.00..4.33 rows=7 width=0) (actual time=0.019..0.019 rows=7 loops=1)
>                                        Index Cond: (6171 = id_euweek)
>                            ->  Bitmap Heap Scan on
> f_ticketupdate_aad5jtwal0ayaax  (cost=1633.90..214617.67 rows=87472
> width=8) (actual time=0.004..0.004 rows=0 loops=7)
>                                  Recheck Cond: (dt_event_id =
> dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id
> )
>                                  ->  Bitmap Index Scan on
> f_ticketupdate_aad5jtwal0ayaax_dt_event_id_idx  (cost=0.00..1612.03
> rows=87472 width=0) (actual time=0.003..0.003 rows=0 loops=7)
>                                        Index Cond: (dt_event_id =
> dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id
> )
>                      ->  Index Scan using
> f_ticketattributeshistory_aajzjp98uraszb6_ticketupdate_id_idx on
> f_ticketattributeshistory_aajzjp98uraszb6  (cost=0.57..4.12 rows=23
> width=20) (never executed)
>                            Index Cond: (ticketupdate_id =
> f_ticketupdate_aad5jtwal0ayaax.id
> )
>                ->  Index Scan using
> f_zendesktickets_aaeljtllr5at3el_pkey on
> f_zendesktickets_aaeljtllr5at3el  (cost=0.43..2.27 rows=1 width=12)
> (never executed)
>                      Index Cond: (id =
> f_ticketattributeshistory_aajzjp98uraszb6.zendesktickets_id)
>                      Filter: ((4765 <> status_id) AND (group_id = 17429))
>  Planning time: 8.516 ms
>  Execution time: 1.895 ms
> 
> the speed is back
> 

Yeah, but the cost is higher (2611152 vs. 1949508). So clearly, the
database believes it's going to be cheaper. I suspect a part of the
issue might be that the join is misestimated - it's expected to produce
~29k rows, but produces 0.

Can you check if this query has the same issue? It's just the
problematic join, and it should be simpler to investigate:

SELECT count(*)
  FROM f_ticketupdate_aad5jtwal0ayaax AS f
  INNER JOIN
  dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z AS d
  ON (f.dt_event_id = d.id)


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Pavel Stehule
2018-04-16 14:00 GMT+02:00 Tomas Vondra :

>
>
> On 04/16/2018 11:34 AM, Pavel Stehule wrote:
> > Hi,
> >
> > my customer does performance checks of PostgreSQL 9.5 and 10. Almost all
> > queries on 10 are faster, but there are few queries (40 from 1000) where
> > PostgreSQL 9.5 is significantly faster than. Unfortunately - pretty fast
> > queries (about 20ms) are too slow now (5 sec).
> >
> > attached execution plans
> >
> > It looks like some cost issue, slow queries prefers Seq scan against
> > bitmap heap scan
> >
> > Hash Cond: (f_ticketupdate_aad5jtwal0ayaax.dt_event_id =
> > dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id
> >  >)
> >->  Parallel Seq Scan on f_ticketupdate_aad5jtwal0ayaax
> > (cost=0.00..1185867.47 rows=24054847 width=8) (actual
> > time=0.020..3741.409 rows=19243863 loops=3)
> >->  Hash  (cost=27.35..27.35 rows=7 width=4) (actual
> > time=0.089..0.089 rows=7 loops=3)
> > Buckets: 1024  Batches: 1  Memory Usage: 9kB
> >
> >
>
> What happens when you disable sequential scans on pg10?
>

set enable_seqscan=off;
set  max_parallel_workers_per_gather=2;

Query Performs nicely, but no parallel workers are used:
GroupAggregate  (cost=2611148.87..2611152.89 rows=31 width=22) (actual
time=0.084..0.084 rows=0 loops=1)
   Group Key: f_zendesktickets_aaeljtllr5at3el.cstm_custom_
38746665_primary_column
   ->  Sort  (cost=2611148.87..2611149.11 rows=99 width=28) (actual
time=0.082..0.082 rows=0 loops=1)
 Sort Key: f_zendesktickets_aaeljtllr5at3el.cstm_custom_
38746665_primary_column
 Sort Method: quicksort  Memory: 25kB
 ->  Nested Loop  (cost=1639.25..2611145.59 rows=99 width=28)
(actual time=0.076..0.076 rows=0 loops=1)
   Join Filter: (((f_ticketattributeshistory_
aajzjp98uraszb6.attrnewvalue_id = ANY ('{4757,4758,4759}'::integer[])) AND
(4754 = f_ticketattributeshistory_aajzjp98uraszb6.attroldvalue_id) AND
(4790 = f_ticketattributeshistory_aajzjp98uraszb6.ticketfield_id)) OR
(f_zendesktickets_aaeljtllr5at3el.dt_createda
t_id = f_ticketupdate_aad5jtwal0ayaax.dt_event_id))
   ->  Nested Loop  (cost=1638.81..1809540.39 rows=350270
width=20) (actual time=0.075..0.075 rows=0 loops=1)
 ->  Nested Loop  (cost=1638.24..1508474.08 rows=69140
width=8) (actual time=0.075..0.075 rows=0 loops=1)
   ->  Bitmap Heap Scan on dwh_dm_aabv5kk9rxac4lz_
aaonw7nchsan2n1_aad8xhr0m_aaewg8j61iagl1z dwh_dm_aabv5kk9rxac4lz_
aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia  (cost=4.34..27.35 rows=7 width=4)
(actual time=0.026..0.038 rows=7 loops=1)
 Recheck Cond: (6171 = id_euweek)
 Heap Blocks: exact=7
 ->  Bitmap Index Scan on
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_a_id_euweek_idx
(cost=0.00..4.33 rows=7 width=0) (actual time=0.019..0.019 rows=7 loops=1)
   Index Cond: (6171 = id_euweek)
   ->  Bitmap Heap Scan on
f_ticketupdate_aad5jtwal0ayaax
(cost=1633.90..214617.67 rows=87472 width=8) (actual time=0.004..0.004
rows=0 loops=7)
 Recheck Cond: (dt_event_id =
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id)
 ->  Bitmap Index Scan on f_ticketupdate_
aad5jtwal0ayaax_dt_event_id_idx  (cost=0.00..1612.03 rows=87472 width=0)
(actual time=0.003..0.003 rows=0 loops=7)
   Index Cond: (dt_event_id =
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id)
 ->  Index Scan using f_ticketattributeshistory_
aajzjp98uraszb6_ticketupdate_id_idx on
f_ticketattributeshistory_aajzjp98uraszb6
(cost=0.57..4.12 rows=23 width=20) (never executed)
   Index Cond: (ticketupdate_id = f_ticketupdate_
aad5jtwal0ayaax.id)
   ->  Index Scan using f_zendesktickets_aaeljtllr5at3el_pkey
on f_zendesktickets_aaeljtllr5at3el  (cost=0.43..2.27 rows=1 width=12)
(never executed)
 Index Cond: (id = f_ticketattributeshistory_
aajzjp98uraszb6.zendesktickets_id)
 Filter: ((4765 <> status_id) AND (group_id = 17429))
 Planning time: 8.516 ms
 Execution time: 1.895 ms

the speed is back



>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: Postgres 10 problem with UNION ALL of null value in "subselect"

2018-04-16 Thread Ashutosh Bapat
On Mon, Apr 16, 2018 at 4:10 PM, Martin Swiech  wrote:
> Hi folks,
>
> I got some complex query which works on PostgreSQL 9.6 , but fails on
> PostgreSQL 10.
>
> Version of PostgreSQL:
> PostgreSQL 10.3 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM version
> 7.0.0 (clang-700.1.76), 64-bit
>
> Simplified core of the problematic query looks like this:
> ```
> select * from (
>select 1::integer as a
> ) t1
> union all
> select * from (
>select null as a
> ) t2;
> ```
>
> It fails with this error message:
> ```
> ERROR:  UNION types integer and text cannot be matched
> LINE 5: select * from (
>^
> SQL state: 42804
> Character: 66
> ```
>

The error disappears if we go one commit before
1e7c4bb0049732ece651d993d03bb6772e5d281a, the error disappears. But
that's I think expected with that commit.

We can work around this problem by casting null to integer like null::integer.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: Prefix operator for text and spgist support

2018-04-16 Thread Ildus Kurbangaliev
On Mon, 16 Apr 2018 12:45:23 +0200
Emre Hasegeli  wrote:

> > Thank you, pushed with some editorization and renaming
> > text_startswith to starts_with  
> 
> I am sorry for not noticing this before, but what is the point of this
> operator?  It seems to me we are only making the prefix searching
> business, which is already complicated, more complicated.

Hi.

> 
> Also, the new operator is not documented on SQL String Functions and
> Operators table.  It is not supported by btree text_pattern_ops or
> btree indexes with COLLATE "C".  It is not defined for "citext", so
> people would get wrong results.  It doesn't use pg_trgm indexes
> whereas LIKE can.

It is mentioned in documentation, look for "starts_with" function.
Currently it's working with spgist indexes which fact is pointed out in
the documentation too. I was going to add btree support but it would
require a new strategy so it will be matter of another patch. I think
this operator could be used in LIKE instead of current weird comparison
operators.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: SHOW ALL does not honor pg_read_all_settings membership

2018-04-16 Thread Laurenz Albe
On Thu, 2018-03-01 at 16:22 +0100, I wrote:
> I noticed that commit 25fff40798fc4ac11a241bfd9ab0c45c085e2212 forgot
> to teach SHOW ALL to show all GUCs when the user belongs to 
> pg_read_all_settings.
> 
> Patch attached; I think this should be backpatched.

Now that the dust from the last commitfest is settling, I'll make a second
attempt to attract attention for this small bug fix.

The original commit was Simon's.

Yours,
Laurenz Albe



Re: very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Tomas Vondra


On 04/16/2018 11:34 AM, Pavel Stehule wrote:
> Hi,
> 
> my customer does performance checks of PostgreSQL 9.5 and 10. Almost all
> queries on 10 are faster, but there are few queries (40 from 1000) where
> PostgreSQL 9.5 is significantly faster than. Unfortunately - pretty fast
> queries (about 20ms) are too slow now (5 sec).
> 
> attached execution plans
> 
> It looks like some cost issue, slow queries prefers Seq scan against
> bitmap heap scan
> 
> Hash Cond: (f_ticketupdate_aad5jtwal0ayaax.dt_event_id =
> dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id
> )
>    ->  Parallel Seq Scan on f_ticketupdate_aad5jtwal0ayaax 
> (cost=0.00..1185867.47 rows=24054847 width=8) (actual
> time=0.020..3741.409 rows=19243863 loops=3)
>    ->  Hash  (cost=27.35..27.35 rows=7 width=4) (actual
> time=0.089..0.089 rows=7 loops=3)
>     Buckets: 1024  Batches: 1  Memory Usage: 9kB
> 
> 

What happens when you disable sequential scans on pg10?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-04-16 Thread Ashutosh Bapat
Hi,
Consider this scenario

postgres=# CREATE TABLE plt (a int, b int) PARTITION BY LIST(a);
postgres=# CREATE TABLE plt_p1 PARTITION OF plt FOR VALUES IN (1);
postgres=# CREATE TABLE plt_p2 PARTITION OF plt FOR VALUES IN (2);
postgres=# INSERT INTO plt VALUES (1, 1), (2, 2);
postgres=# CREATE FOREIGN TABLE fplt (a int, b int) SERVER loopback
OPTIONS (table_name 'plt');
postgres=# SELECT tableoid::regclass, ctid, * FROM fplt;
 tableoid | ctid  | a | b
--+---+---+---
 fplt | (0,1) | 1 | 1
 fplt | (0,1) | 2 | 2
(2 rows)

-- Need to use random() so that following update doesn't turn into a
direct UPDATE.
postgres=# EXPLAIN (VERBOSE, COSTS OFF)
postgres-# UPDATE fplt SET b = (CASE WHEN random() <= 1 THEN 10 ELSE
20 END) WHERE a = 1;
 QUERY PLAN

 Update on public.fplt
   Remote SQL: UPDATE public.plt SET b = $2 WHERE ctid = $1
   ->  Foreign Scan on public.fplt
 Output: a, CASE WHEN (random() <= '1'::double precision) THEN
10 ELSE 20 END, ctid
 Remote SQL: SELECT a, ctid FROM public.plt WHERE ((a = 1)) FOR UPDATE
(5 rows)

postgres=# UPDATE fplt SET b = (CASE WHEN random() <= 1 THEN 10 ELSE
20 END) WHERE a = 1;
postgres=# SELECT tableoid::regclass, ctid, * FROM fplt;
 tableoid | ctid  | a | b
--+---+---+
 fplt | (0,2) | 1 | 10
 fplt | (0,2) | 2 | 10
(2 rows)

We expect only 1 row with a = 1 to be updated, but both the rows get
updated. This happens because both the rows has ctid = (0, 1) and
that's the only qualification used for UPDATE and DELETE. Thus when a
non-direct UPDATE is run on a foreign table which points to a
partitioned table or inheritance hierarchy on the foreign server, it
will update rows from all child table which have ctids same as the
qualifying rows. Same is the case with DELETE.

There are two ways to fix this
1. Use WHERE CURRENT OF with cursors to update rows. This means that
we fetch only one row at a time and update it. This can slow down the
execution drastically.
2. Along with ctid use tableoid as a qualifier i.e. WHERE clause of
UPDATE/DELETE statement has ctid = $1 AND tableoid = $2 as conditions.

PFA patch along the lines of 2nd approach and along with the
testcases. The idea is to inject tableoid attribute to be fetched from
the foreign server similar to ctid and then add it to the DML
statement being constructed.

It does fix the problem. But the patch as is interferes with the way
we handle tableoid currently. That can be seen from the regression
diffs that the patch causes.  RIght now, every tableoid reference gets
converted into the tableoid of the foreign table (and not the tableoid
of the foreign table). Somehow we need to differentiate between the
tableoid injected for DML and tableoid references added by the user in
the original query and then use tableoid on the foreign server for the
first and local foreign table's oid for the second. Right now, I don't
see a simple way to do that.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 6e2fa14..d3c98d3 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1144,7 +1144,7 @@ deparseTargetList(StringInfo buf,
 	}
 
 	/*
-	 * Add ctid and oid if needed.  We currently don't support retrieving any
+	 * Add ctid, tableoid and oid if needed.  We currently don't support retrieving any
 	 * other system columns.
 	 */
 	if (bms_is_member(SelfItemPointerAttributeNumber - FirstLowInvalidHeapAttributeNumber,
@@ -1179,6 +1179,22 @@ deparseTargetList(StringInfo buf,
 		*retrieved_attrs = lappend_int(*retrieved_attrs,
 	   ObjectIdAttributeNumber);
 	}
+	if (bms_is_member(TableOidAttributeNumber - FirstLowInvalidHeapAttributeNumber,
+	  attrs_used))
+	{
+		if (!first)
+			appendStringInfoString(buf, ", ");
+		else if (is_returning)
+			appendStringInfoString(buf, " RETURNING ");
+		first = false;
+
+		if (qualify_col)
+			ADD_REL_QUALIFIER(buf, rtindex);
+		appendStringInfoString(buf, "tableoid");
+
+		*retrieved_attrs = lappend_int(*retrieved_attrs,
+	   TableOidAttributeNumber);
+	}
 
 	/* Don't generate bad syntax if no undropped columns */
 	if (first && !is_returning)
@@ -1725,7 +1741,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 	deparseRelation(buf, rel);
 	appendStringInfoString(buf, " SET ");
 
-	pindex = 2;	/* ctid is always the first param */
+	pindex = 3;	/* ctid, tableoid params appear first */
 	first = true;
 	foreach(lc, targetAttrs)
 	{
@@ -1739,7 +1755,7 @@ deparseUpdateSql(StringInfo buf, PlannerInfo *root,
 		appendStringInfo(buf, " = $%d", pindex);
 		pindex++;
 	}
-	appendStringInfoString(buf, " WHERE ctid = $1");
+	appendStringInfoString(buf, " WHERE ctid = $1 AND tableoid = $2");
 
 	deparseReturningList(buf, root, rtindex, rel,
 		 

Oddity in tuple routing for foreign partitions

2018-04-16 Thread Etsuro Fujita
Hi,

While updating the fix-postgres_fdw-WCO-handling patch, I noticed that
the patch for tuple routing for foreign partitions doesn't work well
with remote triggers.  Here is an example:

postgres=# create table loct1 (a int check (a in (1)), b text);
postgres=# create function br_insert_trigfunc() returns trigger as
$$begin new.b := new.b || ' triggered !'; return new; end$$ language
plpgsql;
postgres=# create trigger loct1_br_insert_trigger before insert on loct1
for each row execute procedure br_insert_trigfunc();
postgres=# create table itrtest (a int, b text) partition by list (a);
postgres=# create foreign table remp1 (a int check (a in (1)), b text)
server loopback options (table_name 'loct1');
postgres=# alter table itrtest attach partition remp1 for values in (1);

This behaves oddly:

postgres=# insert into itrtest values (1, 'foo') returning *;
 a |  b
---+-
 1 | foo
(1 row)

INSERT 0 1

The new value of b in the RETURNING result should be concatenated with '
triggered !', as shown below:

postgres=# select tableoid::regclass, * from itrtest ;
 tableoid | a |b
--+---+-
 remp1| 1 | foo triggered !
(1 row)

The reason for that is: since that in ExecInitPartitionInfo, the core
calls BeginForeignInsert via ExecInitRoutingInfo before initializing
ri_returningList of ResultRelInfo for the partition, BeginForeignInsert
sees that the ri_returningList is NULL and incorrectly recognizes that
the local query doesn't have a RETURNING list.  So, I moved the
ExecInitRoutingInfo call after building the ri_returningList (and before
handling ON CONFLICT because we reference the conversion map created by
that when handling that clause).  The first version of the patch called
BeginForeignInsert that order, but I updated the patch incorrectly.  :(
 Also, I removed the CheckValidResultRel check from ExecInitRoutingInfo
and added that to ExecInitPartitionInfo, right after the
InitResultRelInfo call, because it would be better to abort the
operation as soon as we find the partition to be non-routable.  Another
thing I noticed is the RT index of the foreign partition set to 1 in
postgresBeginForeignInsert to create a remote query; that would not work
for cases where the partitioned table has an RT index larger than 1 (eg,
the partitioned table is not the primary ModifyTable node); in which
cases, since the varno of Vars belonging to the foreign partition in the
RETURNING expressions is the same as the partitioned table, calling
deparseReturningList with the RT index set to 1 against the RETURNING
expressions would produce attrs_used to be NULL, leading to postgres_fdw
not retrieving actually inserted data from the remote, even if remote
triggers might change those data.  So, I fixed this as well, by setting
the RT index accordingly to the partitioned table.  Attached is a patch
for fixing these issues.  I'll add this to the open items list for PG11.
 (After addressing this, I'll post an updated version of the
fix-postgres_fdw-WCO-handling patch.)

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/expected/postgres_fdw.out
--- b/contrib/postgres_fdw/expected/postgres_fdw.out
***
*** 7454,7459  select tableoid::regclass, * FROM itrtest;
--- 7454,7501 
   remp1| 1 | foo
  (1 row)
  
+ delete from itrtest;
+ drop index loct1_idx;
+ -- Test that remote triggers work as expected
+ create function br_insert_trigfunc() returns trigger as $$
+ begin
+ 	new.b := new.b || ' triggered !';
+ 	return new;
+ end
+ $$ language plpgsql;
+ create trigger loct1_br_insert_trigger before insert on loct1
+ 	for each row execute procedure br_insert_trigfunc();
+ create trigger loct2_br_insert_trigger before insert on loct2
+ 	for each row execute procedure br_insert_trigfunc();
+ -- The new values are concatenated with ' triggered !'
+ insert into itrtest values (1, 'foo') returning *;
+  a |b
+ ---+-
+  1 | foo triggered !
+ (1 row)
+ 
+ insert into itrtest values (2, 'qux') returning *;
+  a |b
+ ---+-
+  2 | qux triggered !
+ (1 row)
+ 
+ insert into itrtest values (1, 'test1'), (2, 'test2') returning *;
+  a | b 
+ ---+---
+  1 | test1 triggered !
+  2 | test2 triggered !
+ (2 rows)
+ 
+ with result as (insert into itrtest values (1, 'test1'), (2, 'test2') returning *) select * from result;
+  a | b 
+ ---+---
+  1 | test1 triggered !
+  2 | test2 triggered !
+ (2 rows)
+ 
+ drop trigger loct1_br_insert_trigger on loct1;
+ drop trigger loct2_br_insert_trigger on loct2;
  drop table itrtest;
  drop table loct1;
  drop table loct2;
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 1975,1982  postgresBeginForeignInsert(ModifyTableState *mtstate,
  {
  	PgFdwModifyState *fmstate;
  	Plan	   *plan = mtstate->ps.plan;
  	Relation	rel = resultRelInfo->ri_RelationDesc;
- 	RangeTblEntry *rte;
  

Re: Postgres stucks in deadlock detection

2018-04-16 Thread Konstantin Knizhnik



On 14.04.2018 10:09, Юрий Соколов wrote:
пт, 13 апр. 2018 г., 21:10 Andres Freund >:


Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > + /*
> > > +  * Ensure that only one backend is checking for deadlock.
> > > +  * Otherwise under high load cascade of deadlock timeout
expirations can cause stuck of Postgres.
> > > +  */
> > > + if
(!pg_atomic_test_set_flag(>activeDeadlockCheck))
> > > + {
> > > +  enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > +         return;
> > > + }
> > > + inside_deadlock_check = true;
> > I can't see that ever being accepted.  This means there's
absolutely no
> > bound for deadlock checks happening even under light
concurrency, even
> > if there's no contention for a large fraction of the time.
>
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid
deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.


> Otherwise either probability that all backends  once and once
again are
> trying to check deadlocks concurrently is very small (and can be
even more
> reduced by using random timeout for subsequent deadlock checks),
either
> system can not normally function in any case because large
number of clients
> fall into deadlock.

Operating systems batch wakeups.


> I completely agree that there are plenty of different
approaches, but IMHO
> the currently used strategy is the worst one, because it can
stall system
> even if there are not deadlocks at all.


> I always think that deadlock is a programmer's error rather than
normal
> situation. May be it is wrong assumption

It is.


> So before implementing some complicated solution of the
problem9too slow
> deadlock detection), I think that first it is necessary to
understand
> whether there is such problem at al and under which workload it
can happen.

Sure. I'm not saying that you shouldn't experiment with a patch
like the
one you sent. What I am saying is that that can't be the actual
solution
that will be integrated.


What about my version?
at 
https://www.postgresql.org/message-id/flat/bac42052debbd66e8d5f786d8abe8...@postgrespro.ru
It still performs deadlock detection every time, but it tries to 
detect it with shared lock first,
and only if there is probability of real deadlock, it rechecks with 
exclusive lock.


Although even shared lock leads to some stalleness for active 
transactions, but in the catastrophic situation, where many backends 
to check for inexisting deadlock at the same time, it greately reduce 
pause.


Unfortunately your patch doesn't help with inserts.
In most cases Postgres obtains exclusive partition locks: any 
lock/unlock operation requires exclusive lock.
Shared locks of all partitions are used for collecting some information 
or to perform recovery.


Below results of three different versions:

1. Vanilla Postgres:

[kk@hp06 knizhnik]$ pgbench -n -c 1000 -j 36 -T 100 -P 1 -f test.sql 
postgres

progress: 1.0 s, 227.9 tps, lat 109.560 ms stddev 189.328
progress: 2.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 3.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 4.0 s, 53.0 tps, lat 1145.417 ms stddev 1607.484
progress: 5.0 s, 19.0 tps, lat 236.807 ms stddev 819.307
progress: 6.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 7.0 s, 2.0 tps, lat 6649.944 ms stddev 9.691
progress: 8.0 s, 60.0 tps, lat 2343.313 ms stddev 3335.967
progress: 9.0 s, 89.0 tps, lat 1813.495 ms stddev 3337.904
progress: 10.0 s, 99.1 tps, lat 2093.653 ms stddev 3758.468
progress: 11.0 s, 94.9 tps, lat 2424.560 ms stddev 4258.622
progress: 12.0 s, 79.0 tps, lat 2037.880 ms stddev 4152.258
progress: 13.0 s, 30.0 tps, lat 80.697 ms stddev 24.854
progress: 14.0 s, 2.0 tps, lat 71.642 ms stddev 0.022
progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 16.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 17.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 18.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 19.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 20.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 21.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 22.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 23.0 s, 3.0 tps, lat 22396.463 ms stddev 1.964
progress: 24.0 s, 1.0 tps, lat 23667.927 ms stddev 0.001
progress: 25.0 s, 25.0 tps, lat 8862.603 ms stddev 11545.517
progress: 26.0 s, 27.0 tps, lat 6738.338 ms stddev 10984.215
progress: 27.0 s, 40.0 tps, lat 6656.893 ms stddev 11236.042
progress: 28.0 s, 

Re: Prefix operator for text and spgist support

2018-04-16 Thread Emre Hasegeli
> Thank you, pushed with some editorization and renaming text_startswith to
> starts_with

I am sorry for not noticing this before, but what is the point of this
operator?  It seems to me we are only making the prefix searching
business, which is already complicated, more complicated.

Also, the new operator is not documented on SQL String Functions and
Operators table.  It is not supported by btree text_pattern_ops or
btree indexes with COLLATE "C".  It is not defined for "citext", so
people would get wrong results.  It doesn't use pg_trgm indexes
whereas LIKE can.



Postgres 10 problem with UNION ALL of null value in "subselect"

2018-04-16 Thread Martin Swiech
Hi folks,

I got some complex query which works on PostgreSQL 9.6 , but fails on
PostgreSQL 10.

Version of PostgreSQL:
PostgreSQL 10.3 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM
version 7.0.0 (clang-700.1.76), 64-bit

Simplified core of the problematic query looks like this:
```
select * from (
   select 1::integer as a
) t1
union all
select * from (
   select null as a
) t2;
```

It fails with this error message:
```
ERROR:  UNION types integer and text cannot be matched
LINE 5: select * from (
   ^
SQL state: 42804
Character: 66
```


It worked on PostgreSQL 9.6.


Query without wrapping subselects (t1 and t2) works on both versions of
PostgreSQL (9.6 and 10) well:
```
select 1::integer as a
union all
select null as a;
```


Is there some new optimization of query processing in PostgreSQL 10, which
needs some "early type determination", but named subselects (t1 and t2)
shades the type from first query?

Or could it be some regression bug?

Thanks for answer.

Martin Swiech


very slow queries when max_parallel_workers_per_gather is higher than zero

2018-04-16 Thread Pavel Stehule
Hi,

my customer does performance checks of PostgreSQL 9.5 and 10. Almost all
queries on 10 are faster, but there are few queries (40 from 1000) where
PostgreSQL 9.5 is significantly faster than. Unfortunately - pretty fast
queries (about 20ms) are too slow now (5 sec).

attached execution plans

It looks like some cost issue, slow queries prefers Seq scan against bitmap
heap scan

Hash Cond: (f_ticketupdate_aad5jtwal0ayaax.dt_event_id =
dwh_dm_aabv5kk9rxac4lz_aaonw7nchsan2n1_aad8xhr0m_aaewg8j61ia.id)
   ->  Parallel Seq Scan on f_ticketupdate_aad5jtwal0ayaax
(cost=0.00..1185867.47 rows=24054847 width=8) (actual time=0.020..3741.409
rows=19243863 loops=3)
   ->  Hash  (cost=27.35..27.35 rows=7 width=4) (actual time=0.089..0.089
rows=7 loops=3)
Buckets: 1024  Batches: 1  Memory Usage: 9kB


Regards

Pavel


pg10_regression.sql
Description: application/sql


Re: submake-errcodes

2018-04-16 Thread Devrim Gündüz

Hi,

On Mon, 2018-04-16 at 09:24 +0200, Christoph Berg wrote:
> Fwiw, setting MAKELEVEL=0 worked. Thanks!

Great, it solved my problem as well! Thanks Tom.

Regards,
-- 
Devrim Gündüz
EnterpriseDB: https://www.enterprisedb.com
PostgreSQL Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR

signature.asc
Description: This is a digitally signed message part


Re: ON CONFLICT DO UPDATE for partitioned tables

2018-04-16 Thread Amit Langote
On 2018/04/10 11:56, Amit Langote wrote:
> On 2018/03/27 13:27, Amit Langote wrote:
>> On 2018/03/26 23:20, Alvaro Herrera wrote:
>>> The one thing I wasn't terribly in love with is the four calls to
>>> map_partition_varattnos(), creating the attribute map four times ... but
>>> we already have it in the TupleConversionMap, no?  Looks like we could
>>> save a bunch of work there.
>>
>> Hmm, actually we can't use that map, assuming you're talking about the
>> following map:
>>
>>   TupleConversionMap *map = proute->parent_child_tupconv_maps[partidx];
>>
>> We can only use that to tell if we need converting expressions (as we
>> currently do), but it cannot be used to actually convert the expressions.
>> The map in question is for use by do_convert_tuple(), not to map varattnos
>> in Vars using map_variable_attnos().
>>
>> But it's definitely a bit undesirable to have various
>> map_partition_varattnos() calls within ExecInitPartitionInfo() to
>> initialize the same information (the map) multiple times.
> 
> I will try to think of doing something about this later this week.

The solution I came up with is to call map_variable_attnos() directly,
instead of going through map_partition_varattnos() every time, after first
creating the attribute map ourselves.

>>> And a final item is: can we have a whole-row expression in the clauses?
>>> We currently don't handle those either, not even to throw an error.
>>> [figures a test case] ... and now that I test it, it does crash!
>>>
>>> create table part (a int primary key, b text) partition by range (a);
>>> create table part1 (b text, a int not null);
>>> alter table part attach partition part1 for values from (1) to (1000);
>>> insert into part values (1, 'two') on conflict (a)
>>>   do update set b = format('%s (was %s)', excluded.b, part.b)
>>>   where part.* *<> (1, text 'two');
>>>
>>> I think this means we should definitely handle found_whole_row.  (If you
>>> create part1 in the normal way, it works as you'd expect.)
>>
>> I agree.  That means we simply remove the Assert after the
>> map_partition_varattnos call.
>>
>>> I'm going to close a few other things first, then come back to this.
>>
>> Attached find a patch to fix the whole-row expression issue.  I added your
>> test to insert_conflict.sql.

Combined the above patch into the attached patch.

Thanks,
Amit
From a90decd69a42bebdb6e07c8268686c0500f8c48e Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 16 Apr 2018 17:31:42 +0900
Subject: [PATCH v2] Couple of fixes for ExecInitPartitionInfo

First, avoid repeated calling of map_partition_varattnos.  To do that,
generate the rootrel -> partrel attribute conversion map ourselves
just once and call map_variable_attnos() directly with it.

Second, support conversion of whole-row variables that appear in
ON CONFLICT expressions.  Add relevant test.
---
 src/backend/executor/execPartition.c  | 88 ---
 src/test/regress/expected/insert_conflict.out | 16 +
 src/test/regress/sql/insert_conflict.sql  | 14 +
 3 files changed, 97 insertions(+), 21 deletions(-)

diff --git a/src/backend/executor/execPartition.c 
b/src/backend/executor/execPartition.c
index 218645d43b..1727e111bb 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
 #include "nodes/makefuncs.h"
 #include "partitioning/partbounds.h"
 #include "partitioning/partprune.h"
+#include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
 #include "utils/partcache.h"
 #include "utils/rel.h"
@@ -309,6 +310,8 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
partrel;
ResultRelInfo *leaf_part_rri;
MemoryContext oldContext;
+   AttrNumber *part_attnos = NULL;
+   boolfound_whole_row;
 
/*
 * We locked all the partitions in ExecSetupPartitionTupleRouting
@@ -397,8 +400,19 @@ ExecInitPartitionInfo(ModifyTableState *mtstate,
/*
 * Convert Vars in it to contain this partition's attribute 
numbers.
 */
-   wcoList = map_partition_varattnos(wcoList, firstVarno,
-   
  partrel, firstResultRel, NULL);
+   part_attnos =
+   convert_tuples_by_name_map(RelationGetDescr(partrel),
+  
RelationGetDescr(firstResultRel),
+  
gettext_noop("could not convert row type"));
+   wcoList = (List *)
+   map_variable_attnos((Node *) wcoList,
+   
firstVarno, 0,
+   
part_attnos,
+   

Re: [HACKERS] [PATCH] Lockable views

2018-04-16 Thread Yugo Nagata
On Thu, 05 Apr 2018 07:53:42 +0900 (JST)
Tatsuo Ishii  wrote:

I update the patch to fix the lockable view issues.

> >> > > +typedef struct
> >> > > +{
> >> > > +  Oid root_reloid;
> >> > > +  LOCKMODE lockmode;
> >> > > +  bool nowait;
> >> > > +  Oid viewowner;
> >> > > +  Oid viewoid;
> >> > > +} LockViewRecurse_context;
> >> > 
> >> > Probably wouldn't hurt to pgindent the larger changes in the patch.
> 
> Yeah. Also, each struct member needs a comment.

I applied pgindent and added comments to struct members.

> 
> >> > Hm, how can that happen? And if it can happen, why can it only happen
> >> > with the root relation?
> >> 
> >> For example, the following queries cause the infinite recursion of views. 
> >> This is detected and the error is raised.
> >> 
> >>  create table t (i int);
> >>  create view v1 as select 1;
> >>  create view v2 as select * from v1;
> >>  create or replace view v1 as select * from v2;
> >>  begin;
> >>  lock v1;
> >>  abort;
> >> 
> >> However, I found that the previous patch could not handle the following
> >> situation in which the root relation itself doesn't have infinite 
> >> recursion.
> >> 
> >>  create view v3 as select * from v1;
> >>  begin;
> >>  lock v3;
> >>  abort;
> 
> Shouldn't they be in the regression test?

Added tests for the infinite recursion detection.

Regards,

> 
> It's shame that create_view test does not include the cases, but it's
> another story.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp


-- 
Yugo Nagata 
diff --git a/doc/src/sgml/ref/lock.sgml b/doc/src/sgml/ref/lock.sgml
index 96d477c..a225cea 100644
--- a/doc/src/sgml/ref/lock.sgml
+++ b/doc/src/sgml/ref/lock.sgml
@@ -46,8 +46,8 @@ LOCK [ TABLE ] [ ONLY ] name [ * ]
   
 
   
-   When a view is specified to be locked, all relations appearing in the view
-   definition query are also locked recursively with the same lock mode. 
+   When a view is locked, all relations appearing in the view definition
+   query are also locked recursively with the same lock mode.
   
 
   
@@ -174,6 +174,13 @@ LOCK [ TABLE ] [ ONLY ] name [ * ]

 

+The user performing the lock on the view must have the corresponding privilege
+on the view.  In addition the view's owner must have the relevant privileges on
+the underlying base relations, but the user performing the lock does
+not need any permissions on the underlying base relations.
+   
+
+   
 LOCK TABLE is useless outside a transaction block: the lock
 would remain held only to the completion of the statement.  Therefore
 PostgreSQL reports an error if LOCK
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index b247c0f..5e0ef48 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -31,7 +31,7 @@ static void LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid use
 static AclResult LockTableAclCheck(Oid relid, LOCKMODE lockmode, Oid userid);
 static void RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid,
 			 Oid oldrelid, void *arg);
-static void LockViewRecurse(Oid reloid, Oid root_reloid, LOCKMODE lockmode, bool nowait);
+static void LockViewRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, List *ancestor_views);
 
 /*
  * LOCK TABLE
@@ -67,7 +67,7 @@ LockTableCommand(LockStmt *lockstmt)
 		  (void *) >mode);
 
 		if (get_rel_relkind(reloid) == RELKIND_VIEW)
-			LockViewRecurse(reloid, reloid, lockstmt->mode, lockstmt->nowait);
+			LockViewRecurse(reloid, lockstmt->mode, lockstmt->nowait, NIL);
 		else if (recurse)
 			LockTableRecurse(reloid, lockstmt->mode, lockstmt->nowait, GetUserId());
 	}
@@ -92,7 +92,6 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid,
 		return;	/* woops, concurrently dropped; no permissions
  * check */
 
-
 	/* Currently, we only allow plain tables or views to be locked */
 	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE &&
 		relkind != RELKIND_VIEW)
@@ -178,11 +177,11 @@ LockTableRecurse(Oid reloid, LOCKMODE lockmode, bool nowait, Oid userid)
 
 typedef struct
 {
-	Oid root_reloid;
-	LOCKMODE lockmode;
-	bool nowait;
-	Oid viewowner;
-	Oid viewoid;
+	LOCKMODE	lockmode;		/* lock mode to use */
+	bool		nowait;			/* no wait mode */
+	Oid			viewowner;		/* view owner for checking the privilege */
+	Oid			viewoid;		/* OID of the view to be locked */
+	List	   *ancestor_views;	/* OIDs of ancestor views */
 } LockViewRecurse_context;
 
 static bool
@@ -193,19 +192,22 @@ LockViewRecurse_walker(Node *node, LockViewRecurse_context *context)
 
 	if (IsA(node, Query))
 	{
-		Query		*query = (Query *) node;
-		ListCell	*rtable;
+		Query	   *query = (Query *) node;
+		ListCell   *rtable;
 
 		foreach(rtable, query->rtable)
 		{
-			RangeTblEntry	*rte = lfirst(rtable);
-			AclResult		 

Re: auto_explain and parallel queries issue

2018-04-16 Thread Pavel Stehule
2018-04-16 10:33 GMT+02:00 Tomas Vondra :

>
>
> On 04/16/2018 09:25 AM, Pavel Stehule wrote:
> > Hi
> >
> > I am testing PostgreSQL in customer's environment and I found some
> > unexpected behave.
> >
> > When auto_explain is used, and there is slower parallel query, then
> > auto_explain raises log for any slow process.
> >
>
> FWIW this is the same issue that I reported in 2016:
>
> https://www.postgresql.org/message-id/3f62f24e-51b3-175c-
> 9222-95f25fd2a9d6%402ndquadrant.com


yes. The fix should not be too hard. The report can do only process who
owns query text

Regards

Pavel


>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: auto_explain and parallel queries issue

2018-04-16 Thread Tomas Vondra


On 04/16/2018 09:25 AM, Pavel Stehule wrote:
> Hi
> 
> I am testing PostgreSQL in customer's environment and I found some
> unexpected behave.
> 
> When auto_explain is used, and there is slower parallel query, then
> auto_explain raises log for any slow process.
> 

FWIW this is the same issue that I reported in 2016:

https://www.postgresql.org/message-id/3f62f24e-51b3-175c-9222-95f25fd2a9d6%402ndquadrant.com

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Documentation for bootstrap data conversion

2018-04-16 Thread John Naylor
On 4/9/18, Tom Lane  wrote:
>> 3. It seems the preferred style is to refer to "bootstrap" relations
>> rather than "bootstrapped" relations. The attached patch makes code
>> comments more like the docs in this regard.
>
> Meh, I think either is fine really.  I do recall changing something
> in bki.sgml that referred to both "bootstrap relations" and "bootstrap
> catalogs" in practically the same sentence.  I think that *is* confusing,
> because it's not obvious whether relation and catalog are meant to be
> interchangeable terms (and, in general, they aren't).  If we wanted to
> do anything here I'd be more interested in s/relation/catalog/g in this
> usage.

I was curious, so did these searches

git grep  'bootstrap\S* relation'
git grep  'system .*relation'

and dug through a bit to find cases where 'catalog' is clearly a
better term. Most of these are in the pg_*.h/.dat file boilerplate
comments, which would be easy enough to change with a script. If we're
going to do that, the word order of this phrase now seems a bit
awkward:

definition of the system "access method" catalog (pg_am)

so we may as well go the extra step and do:

definition of the "access method" system catalog (pg_am)

Thoughts?

Beyond that, there are many cases where the language might be
debatable, or at least it's not obvious to the casual observer whether
it could also be referring to an index or toast table, so it's
probably best to leave well enough alone for most cases.

-John Naylor



Re: partitioning code reorganization

2018-04-16 Thread Amit Langote
On 2018/04/15 9:17, Alvaro Herrera wrote:
> Amit Langote wrote:
>> On Sat, Apr 14, 2018 at 11:48 PM, Amit Langote  
>> wrote:
>>> Hi.
>>>
>>> Thanks for taking care of few things I left like those PartitionKey
>>> accessors in rel.h.
>>
>> Forgot to mention -- there are some files that still include
>> catalog/partition.h but no longer need to.  Find a delta patch
>> attached that applies on your v6.
> 
> Thanks!  I pushed this now, putting back the enum in parsenodes and
> including this delta patch of yours.

Not sure if you were intending to discuss the remaining portion of the
changes I proposed last week (patch 0002 posted at [1]), but I'm posting
those patches here for discussion.  I've divided the patch further.

0001-Make-copying-of-cached-partitioning-info-more-con.patch

Make copying of cached partitioning info more consistent

Currently there are many callers that hold onto pointers that
point into the partitioning related information cached in relcache.
There are others, such as the planner, who copy important information
before using it.

Make everyone copy!

Now because no part of the backend relies on the guarantee that
pointers to partitioning info in relcache points to same memory even
across relcache flushes, we don't need special guards as implemented
in RelationClearRelation() to provide the aforementioned guarantee.

0002-Cache-all-partitioning-info-under-one-memory-cont.patch

Cache all partitioning info under one memory context

Instead of having one for PartitionKey and another for PartitionDesc,
use just one.  Also, instead of allocating partition constraint
expression tree directly under CacheMemoryContext, do it under the
aforementioned context.

0003-Cache-partsupfunc-separately-from-PartitionKey.patch

Cache partsupfunc separately from PartitionKey

Callers who want to use partsupfunc now have to copy them separately
from PartitionKey, which makes copying the latter a bit cheaper.


I think going the way of 0001 might seem counter to what we may *really*
want to do in this regard, which is to make it so that we can use (keep
around) the pointer to partition info in relcache, instead of copying the
information in it piece by piece for every query.  Robert's email from a
couple of months ago (that he also recently mentioned) brought this up wrt
to relcache data usage within the planner:

* RelOptInfo -> Relation *
https://www.postgresql.org/message-id/CA%2BTgmoYKToP4-adCFFRNrO21OGuH%3Dphx-fiB1dYoqksNYX6YHQ%40mail.gmail.com

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BHiwqFo_NbJfS%2BY%3DtE94Tn5EVHXN02JkmGjwV4xT6fU3oc5OQ%40mail.gmail.com
From 56b8ef3454103349afa0dd9f5c0335f8ffc1629a Mon Sep 17 00:00:00 2001
From: Amit 
Date: Sat, 14 Apr 2018 00:22:55 +0900
Subject: [PATCH v1 1/3] Make copying of cached partitioning info more
 consistent

Currently there are many callers that hold onto pointers that
point into the partitioning related information cached in relcache.
There are others, such as the planner, who copy important information
before using the information.

Make everyone copy!

Now because no part of the backend relies on the guarantee that
pointers to partitioning info in relcache points to same memory even
across relcache flushes, we don't need special guards as implemented
in RelationClearRelation() to provide the aforementioned guarantee.
---
 src/backend/catalog/heap.c |   2 +-
 src/backend/catalog/partition.c|  20 +
 src/backend/catalog/pg_constraint.c|   9 ++-
 src/backend/commands/indexcmds.c   |   8 +-
 src/backend/commands/tablecmds.c   |  75 --
 src/backend/commands/trigger.c |  23 +++---
 src/backend/executor/execPartition.c   |  63 ---
 src/backend/optimizer/prep/prepunion.c |  14 ++--
 src/backend/optimizer/util/plancat.c   |  75 ++
 src/backend/parser/parse_utilcmd.c |   4 +-
 src/backend/partitioning/partbounds.c  |  38 +
 src/backend/partitioning/partprune.c   |  25 +++---
 src/backend/utils/cache/partcache.c| 140 -
 src/backend/utils/cache/relcache.c |  95 ++
 src/include/catalog/partition.h|   2 -
 src/include/executor/execPartition.h   |  10 ++-
 src/include/partitioning/partbounds.h  |   2 +-
 src/include/partitioning/partprune.h   |   2 +-
 src/include/utils/partcache.h  |   5 ++
 src/include/utils/rel.h|  12 ---
 20 files changed, 318 insertions(+), 306 deletions(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 39813de991..bdb47001c8 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3473,7 +3473,7 @@ StorePartitionBound(Relation rel, Relation parent, 
PartitionBoundSpec *bound)
 * relcache entry for that partition every time a partition is added or
 * 

auto_explain and parallel queries issue

2018-04-16 Thread Pavel Stehule
Hi

I am testing PostgreSQL in customer's environment and I found some
unexpected behave.

When auto_explain is used, and there is slower parallel query, then
auto_explain raises log for any slow process.

 arallel worker for PID 20089   : auto_explain database=NULL client=local
appname=psql
 arallel worker for PID 20089   :  duration: 121.250 ms  plan:
 arallel worker for PID 20089   :  Query Text: select sum(a) from
bigtable group by b;
 arallel worker for PID 20089   :  Partial HashAggregate
(cost=10675.00..10774.59 rows=9959 width=12)
 arallel worker for PID 20089   :Group Key: b
 arallel worker for PID 20089   :->  Parallel Seq Scan on
bigtable  (cost=0.00..8591.67 rows=416667 width=8)
 arallel worker for PID 20089   : auto_explain database=NULL client=local
appname=psql
 arallel worker for PID 20089   :  duration: 128.247 ms  plan:
 arallel worker for PID 20089   :  Query Text: select sum(a) from
bigtable group by b;
 arallel worker for PID 20089   :  Partial HashAggregate
(cost=10675.00..10774.59 rows=9959 width=12)
 arallel worker for PID 20089   :Group Key: b
 arallel worker for PID 20089   :->  Parallel Seq Scan on
bigtable  (cost=0.00..8591.67 rows=416667 width=8)
 res [local] SELECT: auto_explain database=postgres client=[local]
appname=psql
 res [local] SELECT:  duration: 149.478 ms  plan:
 res [local] SELECT:  Query Text: select sum(a) from bigtable group
by b;
 res [local] SELECT:  Finalize HashAggregate
(cost=13865.98..13965.57 rows=9959 width=12)
 res [local] SELECT:Group Key: b
 res [local] SELECT:->  Gather  (cost=11675.00..13766.39
rows=19918 width=12)
 res [local] SELECT:  Workers Planned: 2
 res [local] SELECT:  ->  Partial HashAggregate
(cost=10675.00..10774.59 rows=9959 width=12)
 res [local] SELECT:Group Key: b
 res [local] SELECT:->  Parallel Seq Scan on
bigtable  (cost=0.00..8591.67 rows=416667 width=8)

It is expected behave? Minimally it is undocumented and in this case a
global variable MyProcPort is not initialized.

Regards

Pavel


Re: submake-errcodes

2018-04-16 Thread Christoph Berg
Re: To Tom Lane 2018-04-12 <20180412202717.ga32...@msg.df7cb.de>
> > ... or then again, maybe I do.  Is it possible that your build
> > recipe involves invoking our makefiles from an outer "make" run?
> > If so, maybe you need to explicitly set MAKELEVEL=0 when invoking
> > our build, to keep it from thinking it is a sub-make.  Not sure
> > about whether it'd be wise to reset MAKEFLAGS as well.
> 
> I don't know about Devrim's case, but debian/rules is indeed a
> makefile. Will look closer again once I'm back from pgconf.de, thanks
> for the hint.

Fwiw, setting MAKELEVEL=0 worked. Thanks!

https://salsa.debian.org/postgresql/postgresql/blob/11/debian/rules#L147-148

Christoph



Re: Boolean partitions syntax

2018-04-16 Thread Kyotaro HORIGUCHI
Sorry for a silly typo.

At Mon, 16 Apr 2018 16:17:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20180416.161740.51264437.horiguchi.kyot...@lab.ntt.co.jp>
> Hello. Thank you for the comment.
> 
> the attached v6 patch differs only in gram.y since v5.
> 
> At Fri, 13 Apr 2018 18:55:30 +0900, Amit Langote 
>  wrote in 
> 
> > Horiguchi-san,
> > 
> > Thanks for the latest patch.
> > 
> > On 2018/04/12 13:12, Kyotaro HORIGUCHI wrote:
> > > Thank you for verification and the revised patch. The format is
> > > fine and the fix is correct but I noticed that I forgot to remove
> > > plural S's from error messages. The attached is the version with
> > > the fix.
> > 
> > Looking at the gram.y changes in the latest patch, I think there needs to
> > be some explanatory comments about about the new productions -- u_expr,
> > b0_expr, and c0_expr.
> 
> I think I did that. And refactord the rules.
> 
> It was a bother that some rules used c_expr directly but I
> managed to replace all of them with a_expr by lowering precedence

s/lowering/increasing/;

> of some ordinary keywords (PASSING, BY, COLUMNS and ROW). c_expr
> is no loger used elsewhere so we can just remove columnref from
> c_expr. Finally [abc]0_expr was eliminated and we have only
> a_expr, b_expr, u_expr and c_expr. This seems simple enough.
> 
> The relationship among the rules after this change is as follows.
> 
> a_expr --+-- columnref
>  +-- u_expr ---+-- c_expr -- (all old c_expr stuff except columnref)
>+-- (all old a_expr stuff)
>
> b_expr --+-- columnref
>  +-- c_expr -- (ditto)
>  +-- (all old b_expr stuff)
> 
> On the way fixing this, I noticed that the precedence of some
> keywords (PRESERVE, STRIP_P) that was more than necessary and
> corrected it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Boolean partitions syntax

2018-04-16 Thread Kyotaro HORIGUCHI
Hello. Thank you for the comment.

the attached v6 patch differs only in gram.y since v5.

At Fri, 13 Apr 2018 18:55:30 +0900, Amit Langote 
 wrote in 

> Horiguchi-san,
> 
> Thanks for the latest patch.
> 
> On 2018/04/12 13:12, Kyotaro HORIGUCHI wrote:
> > Thank you for verification and the revised patch. The format is
> > fine and the fix is correct but I noticed that I forgot to remove
> > plural S's from error messages. The attached is the version with
> > the fix.
> 
> Looking at the gram.y changes in the latest patch, I think there needs to
> be some explanatory comments about about the new productions -- u_expr,
> b0_expr, and c0_expr.

I think I did that. And refactord the rules.

It was a bother that some rules used c_expr directly but I
managed to replace all of them with a_expr by lowering precedence
of some ordinary keywords (PASSING, BY, COLUMNS and ROW). c_expr
is no loger used elsewhere so we can just remove columnref from
c_expr. Finally [abc]0_expr was eliminated and we have only
a_expr, b_expr, u_expr and c_expr. This seems simple enough.

The relationship among the rules after this change is as follows.

a_expr --+-- columnref
 +-- u_expr ---+-- c_expr -- (all old c_expr stuff except columnref)
   +-- (all old a_expr stuff)
   
b_expr --+-- columnref
 +-- c_expr -- (ditto)
 +-- (all old b_expr stuff)

On the way fixing this, I noticed that the precedence of some
keywords (PRESERVE, STRIP_P) that was more than necessary and
corrected it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index ed6b680ed8..cfb0984100 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -152,8 +152,6 @@ static Node *substitute_actual_parameters(Node *expr, int nargs, List *args,
 static Node *substitute_actual_parameters_mutator(Node *node,
 	 substitute_actual_parameters_context *context);
 static void sql_inline_error_callback(void *arg);
-static Expr *evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
-			  Oid result_collation);
 static Query *substitute_actual_srf_parameters(Query *expr,
  int nargs, List *args);
 static Node *substitute_actual_srf_parameters_mutator(Node *node,
@@ -4842,7 +4840,7 @@ sql_inline_error_callback(void *arg)
  * We use the executor's routine ExecEvalExpr() to avoid duplication of
  * code and ensure we get the same result as the executor would get.
  */
-static Expr *
+Expr *
 evaluate_expr(Expr *expr, Oid result_type, int32 result_typmod,
 			  Oid result_collation)
 {
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e548476623..ebdb0f7b18 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -180,6 +180,8 @@ static Node *makeXmlExpr(XmlExprOp op, char *name, List *named_args,
 static List *mergeTableFuncParameters(List *func_args, List *columns);
 static TypeName *TableFuncTypeName(List *columns);
 static RangeVar *makeRangeVarFromAnyName(List *names, int position, core_yyscan_t yyscanner);
+static Node *makePartRangeDatum(PartitionRangeDatumKind kind, Node *value,
+int location);
 static void SplitColQualList(List *qualList,
 			 List **constraintList, CollateClause **collClause,
 			 core_yyscan_t yyscanner);
@@ -468,7 +470,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	columnDef columnOptions
 %type 	def_elem reloption_elem old_aggr_elem operator_def_elem
 %type 	def_arg columnElem where_clause where_or_current_clause
-a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
+a_expr u_expr b_expr c_expr
+AexprConst indirection_el opt_slice_bound
 columnref in_expr having_clause func_table xmltable array_expr
 ExclusionWhereClause operator_def_arg
 %type 	rowsfrom_item rowsfrom_list opt_col_def_list
@@ -581,7 +584,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	part_elem
 %type 		part_params
 %type  PartitionBoundSpec
-%type 		partbound_datum PartitionRangeDatum
+%type 		PartitionRangeDatum
 %type 		hash_partbound partbound_datum_list range_datum_list
 %type 		hash_partbound_elem
 
@@ -734,8 +737,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
  * for RANGE, ROWS, GROUPS so that they can follow a_expr without creating
  * postfix-operator problems;
  * for GENERATED so that it can follow b_expr;
- * and for NULL so that it can follow b_expr in ColQualList without creating
- * postfix-operator problems.
+ * for NULL so that it can follow b_expr in ColQualList without creating
+ * postfix-operator problems;
+ * for ROWS to support opt_existing_window_frame;
+ * for ROW to support row;
+ * for PASSING, BY and COLUMNS to support xmltable;

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2018-04-16 Thread Andrew Gierth
> "Thomas" == Thomas Munro  writes:

 Thomas> * it's also been claimed that readahead heuristics are not
 Thomas> defeated on Linux or FreeBSD, which isn't too surprising
 Thomas> because you'd expect it to be about blocks being faulted in,
 Thomas> not syscalls

I don't know about linux, but on FreeBSD, readahead/writebehind is
tracked at the level of open files but implemented at the level of
read/write clustering. I have patched kernels in the past to improve the
performance in mixed read/write cases; pg would benefit on unpatched
kernels from using separate file opens for backend reads and writes.
(The typical bad scenario is doing a create index, or other seqscan that
updates hint bits, on a freshly-restored table; the alternation of
reading block N and writing block N-x destroys the readahead/writebehind
since they use a common offset.)

The code that detects sequential behavior can not distinguish between
pread() and lseek+read, it looks only at the actual offset of the
current request compared to the previous one for the same fp.

 Thomas> +1 for adopting pread()/pwrite() in PG12.

ditto

-- 
Andrew (irc:RhodiumToad)