Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Andres Freund
On 2017-04-05 04:25:41 +, Tsunakawa, Takayuki wrote:
> From: Craig Ringer [mailto:craig.rin...@2ndquadrant.com]
> > On 5 April 2017 at 10:37, Tsunakawa, Takayuki
> >  wrote:
> > 
> > OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock
> > Pages in Memory, using the attached pg_ctl.c.  Please see
> > EnableLockPagesPrivilege() and its call site.  But pg_ctl -w start fails
> > emitting the following message:
> > 
> > That won't work. You'd have to pass 0 to the flags of CreateRestrictedToken
> > and instead supply a PrivilegesToDelete array.
> > You'd probably GetTokenInformation and AND with a mask of ones you wanted
> > to retain.
> 
> Uh, that's inconvenient.  We can't determine what privileges to delete, and 
> we must be aware of new privileges added in the future version of Windows.
> 
> Then, I have to say the last patch (v12) is the final answer.

As I asked before, why can't we delete all privs and add the explicitly
needed once back (using AdjustTokenPrivileges)?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [pgsql-www] Small issue in online devel documentation build

2017-04-04 Thread Noah Misch
On Thu, Mar 23, 2017 at 11:21:39PM -0400, Peter Eisentraut wrote:
> I think the fix belongs into the web site CSS, so there is nothing to
> commit into PostgreSQL here.  I will close the commit fest entry, but I
> have added a section to the open items list so we keep track of it.
> (https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain)

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: extended stats not friendly towards ANALYZE with subset of columns

2017-04-04 Thread Noah Misch
On Tue, Mar 28, 2017 at 10:30:03PM +1300, David Rowley wrote:
> I'm just reviewing Tomas' code for the dependencies part of the stats
> when I saw something that looked a bit unusual.
> 
> I tested with:
> 
> CREATE TABLE ab1 (a INTEGER, b INTEGER);
> ALTER TABLE ab1 ALTER a SET STATISTICS 0;
> INSERT INTO ab1 SELECT a, a%23 FROM generate_series(1, 1000) a;
> CREATE STATISTICS ab1_a_b_stats ON (a, b) FROM ab1;
> ANALYZE ab1;
> 
> And got:
> 
> ERROR:  extended statistics could not be collected for column "a" of
> relation public.ab1
> HINT:  Consider ALTER TABLE "public"."ab1" ALTER "a" SET STATISTICS -1
> 
> I don't think the error is useful here, as it means nothing gets done.
> Probably better to just not (re)build those stats.
> 
> Another option would be to check for extended stats before deciding
> which rows to ANALYZE, then still gathering the columns required for
> MV stats, but I think if the user asks for a subset of columns to be
> analyzed, and that causes a column to be missing for an extended
> statistics, that it would be pretty surprising if we rebuild the
> extended stats.
> 
> Perhaps the SET STATISTIC 0 for a column still needs to gather data
> for extended statistics, though. Perhaps a debate should ensue about
> how that should work exactly.
> 
> I've attached a patch which fixes the problem above, but it does
> nothing to change the analyze behaviour for 0 statistics columns.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Álvaro,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Sync pg_dump and pg_dumpall output

2017-04-04 Thread Noah Misch
On Fri, Mar 24, 2017 at 12:26:33PM +0900, Michael Paquier wrote:
> On Thu, Mar 23, 2017 at 1:10 AM, Stephen Frost  wrote:
> > * Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote:
> >> On 03/22/2017 11:39 AM, Stephen Frost wrote:
> >> > * Andrew Dunstan (and...@dunslane.net) wrote:
> >> >> Sync pg_dump and pg_dumpall output
> >> > This probably should have adjusted all callers of pg_dump in the
> >> > regression tests to use the --no-sync option, otherwise we'll end up
> >> > spending possibly a good bit of time calling fsync() during the
> >> > regression tests unnecessairly.
> >>
> >> All of them? The imnpact is not likely to be huge in most cases
> >> (possibly different on Windows). On crake, the bin-check stage actually
> >> took less time after the change than before, so I suspect that the
> >> impact will be pretty small.
> >
> > Well, perhaps not all, but I'd think --no-sync would be better to have
> > in most cases.  We turn off fsync for all of the TAP tests and all
> > initdb calls, so it seems like we should here too.  Perhaps it won't be
> > much overhead on an unloaded system, but not all of the buildfarm
> > animals seem to be on unloaded systems, nor are they particularly fast
> > in general or have fast i/o..
> 
> Agreed.
> 
> >> Still I agree that we should have tests for both cases.
> >
> > Perhaps, though if I recall correctly, we've basically had zero calls
> > for fsync() until this.  If we don't feel like we need to test that in
> > the backend then it seems a bit silly to feel like we need it for
> > pg_dump's regression coverage.
> >
> > That said, perhaps the right answer is to have a couple tests for both
> > the backend and for pg_dump which do exercise the fsync-enabled paths.
> 
> And agreed.
> 
> The patch attached uses --no-sync in most places for pg_dump, except
> in 4 places of 002_pg_dump.pl to stress as well the sync code path.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Andrew,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-04-04 Thread Noah Misch
On Fri, Mar 10, 2017 at 11:49:46AM -0800, Andres Freund wrote:
> On 2017-03-09 13:34:22 -0500, Robert Haas wrote:
> > On Mon, Jan 30, 2017 at 6:54 PM, Tom Lane  wrote:
> > > Andres Freund  writes:
> > >> Wonder if we there's an argument to be made for implementing this
> > >> roughly similarly to split_pathtarget_at_srf - instead of injecting a
> > >> ProjectSet node we'd add a FunctionScan node below a Result node.
> > >
> > > Yeah, possibly.  That would have the advantage of avoiding an ExecProject
> > > step when the SRFs aren't buried, which would certainly be the expected
> > > case.
> > >
> > > If you don't want to make ExecInitExpr responsible, then the planner would
> > > have to do something like split_pathtarget_at_srf anyway to decompose the
> > > expressions, no matter which executor representation we use.
> > 
> > Did we do anything about this?  Are we going to?
> 
> Working on a patch.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Andres,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-04-04 Thread Noah Misch
On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
> Regarding this feature, there are some loose ends. We should work on
> and complete them until the release.
> 
> (1)
> Which synchronous replication method, priority or quorum, should be
> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
> a priority-based sync replication is chosen for keeping backward
> compatibility. However some hackers argued to change this decision
> so that a quorum commit is chosen because they think that most users
> prefer to a quorum.
> 
> (2)
> There will be still many source comments and documentations that
> we need to update, for example, in high-availability.sgml. We need to
> check and update them throughly.
> 
> (3)
> The priority value is assigned to each standby listed in s_s_names
> even in quorum commit though those priority values are not used at all.
> Users can see those priority values in pg_stat_replication.
> Isn't this confusing? If yes, it might be better to always assign 1 as
> the priority, for example.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tatsuo Ishii
>> I have done tests using pgproto. One thing I noticed a strange behavior.
>> Below is an output of pgproto. The test first set the timeout to 3 seconds,
>> and parse/bind for "SELECT pg_sleep(2)" then set timeout to 1 second using
>> extended query. Subsequent Execute emits a statement timeout error as
>> expected, but next "SELECT pg_sleep(2)"
>> call using extended query does not emit a statement error. The test for
>> this is "007-timeout-twice". Attached is the test cases including this.
> 
> What's the handling of transactions like in pgproto?  I guess the first 
> statement timeout error rolled back the effect of "SET statement_timeout = 
> '1s'", and the timeout reverted to 3s or some other value.

Since pgproto is a dumb protocol machine, it does not start a
transaction automatically (user needs to explicitly send a start
transaction command via either simple or extended query). In this
particular case no explicit transaction has started.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-04 Thread Ashutosh Bapat
On Wed, Apr 5, 2017 at 8:39 AM, Robert Haas  wrote:
> On Tue, Apr 4, 2017 at 10:22 AM, Ashutosh Bapat
>  wrote:
>> Yes, I agree. For an inner join, the partition key types need to "shrink"
>> and for outer join they need to be "widened". I don't know if there is a way
>> to know "wider" or "shorter" of two given types. We might have to implement
>> a method to merge partition keys to produce partition key of the join, which
>> may be different from either of the partition keys. So, after-all we may
>> have to abandon the idea of canonical partition scheme. I haven't included
>> this change in the attached set of patches.
>
> I think this is why you need to regard the partitioning scheme as
> something more like an equivalence class - possibly the partitioning
> scheme should actually contain (or be?) an equivalence class.  Suppose
> this is the query:
>
> SELECT * FROM i4 INNER JOIN i8 ON i4.x = i8.x;
>
> ...where i4 (x) is an int4 partitioning key and i8 (x) is an int8
> partitioning key.  It's meaningless to ask whether the result of the
> join is partitioned by int4 or int8.  It's partitioned by the
> equivalence class that contains both i4.x and i8.x.  If the result of
> this join where joined to another table on either of those two
> columns, a second partition-wise join would be theoretically possible.
> If you insist on knowing the type of the partitioning scheme, rather
> than just the opfamily, you've boxed yourself into a corner from which
> there's no good escape.

Only inner join conditions have equivalence classes associated with
those. Outer join conditions create single element equivalence
classes. So, we can not associate equivalence classes as they are with
partition scheme. If we could do that, it makes life much easier since
checking whether equi-join between all partition keys exist, is simply
looking up equivalence classes that cover joining relations and find
em_member corresponding to partition keys.

It looks like we should only keep strategy, partnatts, partopfamily
and parttypcoll in PartitionScheme. A partition-wise join between two
relations would be possible if all those match. When matching
partition bounds of joining relations, we should rely on partopfamily
to give us comparison function based on the types of partition keys
being joined. In that context it looks like all the partition bound
comparision functions which accept partition key were not written
keeping this use case in mind. They will need to be rewritten to
accept strategy, partnatts, partopfamily and parttypcoll.

There's a relevant comment in 0006, build_joinrel_partition_info()
(probably that name needs to change, but I will do that once we have
settled on design)
+   /*
+* Construct partition keys for the join.
+*
+* An INNER join between two partitioned relations is partition by key
+* expressions from both the relations. For tables A and B
partitioned by a and b
+* respectively, (A INNER JOIN B ON A.a = B.b) is partitioned by both A.a
+* and B.b.
+*
+* An OUTER join like (A LEFT JOIN B ON A.a = B.b) may produce rows with
+* B.b NULL. These rows may not fit the partitioning conditions imposed on
+* B.b. Hence, strictly speaking, the join is not partitioned by B.b.
+* Strictly speaking, partition keys of an OUTER join should include
+* partition key expressions from the OUTER side only. Consider a join like
+* (A LEFT JOIN B on (A.a = B.b) LEFT JOIN C ON B.b = C.c. If we do not
+* include B.b as partition key expression for (AB), it prohibits us from
+* using partition-wise join when joining (AB) with C as there is no
+* equi-join between partition keys of joining relations. But two NULL
+* values are never equal and no two rows from mis-matching partitions can
+* join. Hence it's safe to include B.b as partition key expression for
+* (AB), even though rows in (AB) are not strictly partitioned by B.b.
+*/

I think that also needs to be reviewed carefully. Partition-wise joins
may be happy including partition keys from all sides, but
partition-wise aggregates may not be, esp. when pushing complete
aggregation down to partitions. In that case, rows with NULL partition
key, which falls on nullable side of join, will be spread across
multiple partitions. Proabably, we should separate nullable and
non-nullable partition key expressions.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-04-04 Thread Sven R. Kunze

Thanks Tomas and David for hacking on this patch.

On 04.04.2017 20:19, Tomas Vondra wrote:
I'm not sure we still need the min_group_size, when evaluating 
dependencies. It was meant to deal with 'noisy' data, but I think it 
after switching to the 'degree' it might actually be a bad idea.


Consider this:

create table t (a int, b int);
insert into t select 1, 1 from generate_series(1, 1) s(i);
insert into t select i, i from generate_series(2, 2) s(i);
create statistics s with (dependencies) on (a,b) from t;
analyze t;

select stadependencies from pg_statistic_ext ;
  stadependencies

 [{1 => 2 : 0.44}, {2 => 1 : 0.44}]
(1 row)

So the degree of the dependency is just ~0.333 although it's obviously 
a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The 
reason is that we discard 2/3 of rows, because those groups are only a 
single row each, except for the one large group (1/3 of rows).


Just for me to follow the comments better. Is "dependency" roughly the 
same as when statisticians speak about " conditional probability"?


Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tatsuo Ishii
> I have done tests using pgproto. One thing I noticed a strange behavior.
> Below is an output of pgproto. The test first set the timeout to 3 seconds,
> and parse/bind for "SELECT pg_sleep(2)" then set timeout to 1 second using
> extended query. Subsequent Execute emits a statement timeout error as
> expected, but next "SELECT pg_sleep(2)"
> call using extended query does not emit a statement error. The test for
> this is "007-timeout-twice". Attached is the test cases including this.

What's the handling of transactions like in pgproto?  I guess the first 
statement timeout error rolled back the effect of "SET statement_timeout = 
'1s'", and the timeout reverted to 3s or some other value.

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
>> What do you think?  I've not really tested this with the extended protocol,
>> so I'd appreciate if you could rerun your test from the older thread.
> 
> The patch looks good and cleaner.  It looks like the code works as expected.  
> As before, I ran one INSERT statement with PgJDBC, with gdb's breakpoints set 
> on enable_timeout() and disable_timeout().  I confirmed that enable_timeout() 
> is called just once at Parse message, and disable_timeout() is called just 
> once at Execute message.
> 
> I'd like to wait for Tatsuo-san's thorough testing with pgproto.

I have done tests using pgproto. One thing I noticed a strange
behavior. Below is an output of pgproto. The test first set the
timeout to 3 seconds, and parse/bind for "SELECT pg_sleep(2)" then set
timeout to 1 second using extended query. Subsequent Execute emits a
statement timeout error as expected, but next "SELECT pg_sleep(2)"
call using extended query does not emit a statement error. The test
for this is "007-timeout-twice". Attached is the test cases including
this.

FE=> Query(query="SET statement_timeout = '3s'")
<= BE CommandComplete(SET)
<= BE ReadyForQuery(I)
FE=> Parse(stmt="S1", query="SELECT pg_sleep(2)")
FE=> Bind(stmt="S1", portal="S2")
FE=> Parse(stmt="", query="SET statement_timeout = '1s'")
FE=> Bind(stmt="", portal="")
FE=> Execute(portal="")
FE=> Execute(portal="S2")
FE=> Sync
<= BE ParseComplete
<= BE BindComplete
<= BE ParseComplete
<= BE BindComplete
<= BE CommandComplete(SET)
<= BE ErrorResponse(S ERROR V ERROR C 57014 M canceling statement due to 
statement timeout F postgres.c L 2968 R ProcessInterrupts )
<= BE ReadyForQuery(I)
FE=> Parse(stmt="S3", query="SELECT pg_sleep(2)")
FE=> Bind(stmt="S3", portal="S2")
FE=> Execute(portal="S2")
FE=> Sync
<= BE ParseComplete
<= BE BindComplete
<= BE DataRow
<= BE CommandComplete(SELECT 1)
<= BE ReadyForQuery(I)
FE=> Terminate



tests.tar.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-04 Thread Rushabh Lathia
On Wed, Apr 5, 2017 at 10:59 AM, Amit Langote  wrote:

> On 2017/04/05 6:22, Keith Fiske wrote:
> > On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
> >> Please find attached an updated patch.
> >> Following has been accomplished in this update:
> >>
> >> 1. A new partition can be added after default partition if there are no
> >> conflicting rows in default partition.
> >> 2. Solved the crash reported earlier.
> >
> > Installed and compiled against commit
> > 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> > -0400) without any issue
> >
> > However, running your original example, I'm getting this error
> >
> > keith@keith=# CREATE TABLE list_partitioned (
> > keith(# a int
> > keith(# ) PARTITION BY LIST (a);
> > CREATE TABLE
> > Time: 4.933 ms
> > keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned
> FOR
> > VALUES IN (DEFAULT);
> > CREATE TABLE
> > Time: 3.492 ms
> > keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR
> VALUES
> > IN (4,5);
> > ERROR:  unrecognized node type: 216
>
> It seems like the new ExecPrepareCheck should be used in the place of
> ExecPrepareExpr in the code added in check_new_partition_bound().
>
> > Also, I'm still of the opinion that denying future partitions of values
> in
> > the default would be a cause of confusion. In order to move the data out
> of
> > the default and into a proper child it would require first removing that
> > data from the default, storing it elsewhere, creating the child, then
> > moving it back. If it's only a small amount of data it may not be a big
> > deal, but if it's a large amount, that could cause quite a lot of
> > contention if done in a single transaction. Either that or the user would
> > have to deal with data existing in the table, disappearing, then
> > reappearing again.
> >
> > This also makes it harder to migrate an existing table easily.
> Essentially
> > no child tables for a large, existing data set could ever be created
> > without going through one of the two situations above.
>
> I thought of the following possible way to allow future partitions when
> the default partition exists which might contain rows that belong to the
> newly created partition (unfortunately, nothing that we could implement at
> this point for v10):
>
> Suppose you want to add a new partition which will accept key=3 rows.
>
> 1. If no default partition exists, we're done; no key=3 rows would have
>been accepted by any of the table's existing partitions, so no need to
>move any rows
>
> 2. Default partition exists which might contain key=3 rows, which we'll
>need to move.  If we do this in the same transaction, as you say, it
>might result in unnecessary unavailability of table's data.  So, it's
>better to delegate that responsibility to a background process.  The
>current transaction will only add the new key=3 partition, so any key=3
>rows will be routed to the new partition from this point on.  But we
>haven't updated the default partition's constraint yet to say that it
>no longer contains key=3 rows (constraint that the planner consumes),
>so it will continue to be scanned for queries that request key=3 rows
>(there should be some metadata to indicate that the default partition's
>constraint is invalid), along with the new partition.
>
> 3. A background process receives a "work item" requesting it to move all
>key=3 rows from the default partition heap to the new partition's heap.
>The movement occurs without causing the table to become unavailable;
>once all rows have been moved, we momentarily lock the table to update
>the default partition's constraint to mark it valid, so that it will
>no longer be accessed by queries that want to see key=3 rows.
>
> Regarding 2, there is a question of whether it should not be possible for
> the row movement to occur in the same transaction.  Somebody may want that
> to happen because they chose to run the command during a maintenance
> window, when the table's becoming unavailable is not an issue.  In that
> case, we need to think of the interface more carefully.
>
> Regarding 3, I think the new autovacuum work items infrastructure added by
> the following commit looks very promising:
>
> * BRIN auto-summarization *
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=
> 7526e10224f0792201e99631567bbe44492bbde4
>
> > However, thinking through this, I'm not sure I can see a solution without
> > the global index support. If this restriction is removed, there's still
> an
> > issue of data duplication after the necessary child table is added. So
> > guess it's a matter of deciding which user experience is better for the
> > moment?
>
> I'm not sure about the fate of this particular patch for v10, but until we
> implement a solution to move rows and design appropriate interface for the
> same, we could error out if moving rows is required at all, like the patch
> does.
>
>

Re: [HACKERS] BRIN cost estimate

2017-04-04 Thread Emre Hasegeli
> Interested to hear comments on this.


I don't have chance to test it right now, but I am sure it would be an
improvement over what we have right now.  There is no single correct
equation with so many unknowns we have.

>   *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);

Have you preserved this line intentionally?  This would make BRIN index
scan cost even higher, but the primary property of BRIN is to be cheap to
scan. Shouldn't bitmap heap scan node take into account of checking the
tuples in its cost?


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-04 Thread Amit Langote
On 2017/04/05 6:22, Keith Fiske wrote:
> On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote:
>> Please find attached an updated patch.
>> Following has been accomplished in this update:
>>
>> 1. A new partition can be added after default partition if there are no
>> conflicting rows in default partition.
>> 2. Solved the crash reported earlier.
>
> Installed and compiled against commit
> 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
> -0400) without any issue
> 
> However, running your original example, I'm getting this error
> 
> keith@keith=# CREATE TABLE list_partitioned (
> keith(# a int
> keith(# ) PARTITION BY LIST (a);
> CREATE TABLE
> Time: 4.933 ms
> keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
> VALUES IN (DEFAULT);
> CREATE TABLE
> Time: 3.492 ms
> keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
> IN (4,5);
> ERROR:  unrecognized node type: 216

It seems like the new ExecPrepareCheck should be used in the place of
ExecPrepareExpr in the code added in check_new_partition_bound().

> Also, I'm still of the opinion that denying future partitions of values in
> the default would be a cause of confusion. In order to move the data out of
> the default and into a proper child it would require first removing that
> data from the default, storing it elsewhere, creating the child, then
> moving it back. If it's only a small amount of data it may not be a big
> deal, but if it's a large amount, that could cause quite a lot of
> contention if done in a single transaction. Either that or the user would
> have to deal with data existing in the table, disappearing, then
> reappearing again.
>
> This also makes it harder to migrate an existing table easily. Essentially
> no child tables for a large, existing data set could ever be created
> without going through one of the two situations above.

I thought of the following possible way to allow future partitions when
the default partition exists which might contain rows that belong to the
newly created partition (unfortunately, nothing that we could implement at
this point for v10):

Suppose you want to add a new partition which will accept key=3 rows.

1. If no default partition exists, we're done; no key=3 rows would have
   been accepted by any of the table's existing partitions, so no need to
   move any rows

2. Default partition exists which might contain key=3 rows, which we'll
   need to move.  If we do this in the same transaction, as you say, it
   might result in unnecessary unavailability of table's data.  So, it's
   better to delegate that responsibility to a background process.  The
   current transaction will only add the new key=3 partition, so any key=3
   rows will be routed to the new partition from this point on.  But we
   haven't updated the default partition's constraint yet to say that it
   no longer contains key=3 rows (constraint that the planner consumes),
   so it will continue to be scanned for queries that request key=3 rows
   (there should be some metadata to indicate that the default partition's
   constraint is invalid), along with the new partition.

3. A background process receives a "work item" requesting it to move all
   key=3 rows from the default partition heap to the new partition's heap.
   The movement occurs without causing the table to become unavailable;
   once all rows have been moved, we momentarily lock the table to update
   the default partition's constraint to mark it valid, so that it will
   no longer be accessed by queries that want to see key=3 rows.

Regarding 2, there is a question of whether it should not be possible for
the row movement to occur in the same transaction.  Somebody may want that
to happen because they chose to run the command during a maintenance
window, when the table's becoming unavailable is not an issue.  In that
case, we need to think of the interface more carefully.

Regarding 3, I think the new autovacuum work items infrastructure added by
the following commit looks very promising:

* BRIN auto-summarization *
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4

> However, thinking through this, I'm not sure I can see a solution without
> the global index support. If this restriction is removed, there's still an
> issue of data duplication after the necessary child table is added. So
> guess it's a matter of deciding which user experience is better for the
> moment?

I'm not sure about the fate of this particular patch for v10, but until we
implement a solution to move rows and design appropriate interface for the
same, we could error out if moving rows is required at all, like the patch
does.

Could you briefly elaborate why you think the lack global index support
would be a problem in this regard?

I agree that some design is required here to implement a solution
redistribution of rows; not only in the context of supporting the notion
of def

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Tom Lane
Peter Eisentraut  writes:
> On 4/5/17 00:58, Tom Lane wrote:
>> Another issue is whether you won't get compiler complaints about
>> redefinition of the "true" and "false" macros.  But those would
>> likely only be warnings, not flat-out errors.

> The complaint about bool is also just a warning.

Really?

$ cat test.c
typedef char bool;
typedef char bool;
$ gcc -c test.c
test.c:2: error: redefinition of typedef 'bool'
test.c:1: note: previous declaration of 'bool' was here

This is with gcc 4.4.7.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Peter Eisentraut
On 4/5/17 00:58, Tom Lane wrote:
> Another issue is whether you won't get compiler complaints about
> redefinition of the "true" and "false" macros.  But those would
> likely only be warnings, not flat-out errors.

The complaint about bool is also just a warning.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of 'Andres Freund'
> Attached.  I did not like that the previous patch had the timeout handling
> duplicated in the individual functions, I instead centralized it into
> start_xact_command().  Given that it already activated the timeout in the
> most common cases, that seems to make more sense to me. In your version
> we'd have called enable_statement_timeout() twice consecutively (which
> behaviourally is harmless).
> 
> What do you think?  I've not really tested this with the extended protocol,
> so I'd appreciate if you could rerun your test from the older thread.

The patch looks good and cleaner.  It looks like the code works as expected.  
As before, I ran one INSERT statement with PgJDBC, with gdb's breakpoints set 
on enable_timeout() and disable_timeout().  I confirmed that enable_timeout() 
is called just once at Parse message, and disable_timeout() is called just once 
at Execute message.

I'd like to wait for Tatsuo-san's thorough testing with pgproto.

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Tom Lane
Robert Haas  writes:
> On Tue, Apr 4, 2017 at 6:56 PM, Joe Conway  wrote:
>> Any objections?

> I'm guessing Tom's going to have a strong feeling about whether 0001a
> is the right way to address the stdbool issue,

I will?  [ looks ... ]  Yup, you're right.

I doubt that works at all, TBH.  What I'd expect to happen with a
typical compiler is a complaint about redefinition of typedef bool,
because c.h already declared it and here this fragment is doing
so again.  It'd make sense to me to do

+ #ifdef bool
+ #undef bool
+ #endif

to get rid of the macro definition of bool that stdbool.h is
supposed to provide.  But there should be no reason to declare
our typedef a second time.

Another issue is whether you won't get compiler complaints about
redefinition of the "true" and "false" macros.  But those would
likely only be warnings, not flat-out errors.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compiler warning in costsize.c

2017-04-04 Thread Michael Paquier
On Wed, Apr 5, 2017 at 2:54 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> In builds where USE_ASSERT_CHECKING is not enabled, costsize.c can
>> generate warnings. Here is for example with MSVC:
>>   src/backend/optimizer/path/costsize.c(4520): warning C4101: 'rte' : 
>> unreferenced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>>   src/backend/optimizer/path/costsize.c(4640): warning C4101: 'rte' : 
>> unreferenced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>
>> a9c074ba7 has done an effort, but a bit more is needed as the attached.
>
> That doesn't look right at all:
>
> +#ifdef USE_ASSERT_CHECKING
> RangeTblEntry *rte PG_USED_FOR_ASSERTS_ONLY;
> +#endif
>
> The entire point of the PG_USED_FOR_ASSERTS_ONLY annotation is to suppress
> this type of warning without a need for an explicit #ifdef like that one.
>
> We should either fix PG_USED_FOR_ASSERTS_ONLY to work on MSVC as well,
> or decide that we don't care about suppressing such warnings on MSVC,
> or give up on PG_USED_FOR_ASSERTS_ONLY and remove it everywhere in
> favor of plain #ifdefs.

Visual does not have any equivalent of __attribute__((unused))... And
visual does not have an equivalent after looking around. A trick that
I can think of would be to change PG_USED_FOR_ASSERTS_ONLY to be
roughly a macro like that:

#if defined(__GNUC__)
#define PG_ASSERT_ONLY(x) ASSERT_ONLY_ ## x __attribute__((unused))
#else
#define PG_ASSERT_ONLY(x) ((void) x)
#endif

But that's ugly...

> (I'm personally not that much in love with PG_USED_FOR_ASSERTS_ONLY,
> because it tends to confuse pgindent.)

I would be incline to just do that, any other solution I can think of
is uglier than that.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tom Lane
Andres Freund  writes:
> On 2017-04-05 10:05:19 +0900, Tatsuo Ishii wrote:
>> What's your point of the question? What kind of problem do you expect
>> if the timeout starts only once at the first parse meesage out of
>> bunch of parse messages?

> It's perfectly valid to send a lot of Parse messages without
> interspersed Sync or Bind/Execute message.  There'll be one timeout
> covering all of those Parse messages, which can thus lead to a timeout,
> even though nothing actually takes long individually.

It might well be reasonable to redefine statement_timeout as limiting the
total time from the first client input to the response to Sync ... but
if that's what we're doing, let's make sure we do it consistently.

I haven't read the patch, but the comments in this thread make me fear
that it's introducing some ad-hoc, inconsistent behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-04 Thread David Rowley
On 3 April 2017 at 03:05, Emre Hasegeli  wrote:
> Unfortunately, I am on vacation for two weeks without my computer.  I can
> post another version after 18th.  I know we are under time pressure for
> release.  I wouldn't mind if you or Alvaro would change it anyway you like.

I've made some changes. Actually, I completely changed how the
estimates work. I find this method more self-explanatory.

Basically, we work out the total index ranges, then work out how many
of those we'd touch in a perfectly correlated scenario. We then work
out how many ranges we'll probably visit based on the correlation
estimates from the stats, and assume the selectivity is probableRanges
/ totalIndexRanges.

I've attached a spreadsheet that compares Emre's method to mine. Mine
seems to favour the BRIN index less when the table is small. I think
this is pretty natural since if there is only 1 range, and we narrow
the result to one of them, then we might as well have performed a
seqscan.

My method seems favour BRIN a bit longer when the correlation is
between about 1% and 100%. But penalises BRIN much more when
correlation is less than around 1%. This may be better my way is
certainly smarter than the unpatched version, but still holds on a bit
longer, which may be more favourable if a BRIN index actually exists.
It might be more annoying for a user to have added a BRIN index and
never have the planner choose it.

My method also never suffers from estimating > 100% of the table.

I was a bit worried that Emre's method would penalise BRIN too much
when the correlation is not so high.

Interested to hear comments on this.

Please feel free to play with the spreadsheet by changing rows 1-3 in column B.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


brin-correlation-drowley_v1.patch
Description: Binary data


BRIN_estimates2.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Tsunakawa, Takayuki
From: Craig Ringer [mailto:craig.rin...@2ndquadrant.com]
> On 5 April 2017 at 10:37, Tsunakawa, Takayuki
>  wrote:
> 
> OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock
> Pages in Memory, using the attached pg_ctl.c.  Please see
> EnableLockPagesPrivilege() and its call site.  But pg_ctl -w start fails
> emitting the following message:
> 
> That won't work. You'd have to pass 0 to the flags of CreateRestrictedToken
> and instead supply a PrivilegesToDelete array.
> You'd probably GetTokenInformation and AND with a mask of ones you wanted
> to retain.

Uh, that's inconvenient.  We can't determine what privileges to delete, and we 
must be aware of new privileges added in the future version of Windows.

Then, I have to say the last patch (v12) is the final answer.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

2017-04-04 Thread Michael Paquier
fore

On Wed, Apr 5, 2017 at 7:05 AM, Heikki Linnakangas  wrote:
> I will continue tomorrow, but I wanted to report on what I've done so far.
> Attached is a new patch version, quite heavily modified. Notable changes so
> far:

Great, thanks!

> * Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit ints.
> IMHO this makes the tables easier to read (to a human), and they are also
> packed slightly more tightly (see next two points), as you can fit more
> codepoints in a 16-bit integer.

Using directly codepoints is not much consistent with the existing
backend, but for the sake of packing things more, OK.

> * Reorganize the decomposition table. Instead of a separate binary-searched
> table for each "size", have one table that's ordered by codepoint, which
> contains a length and offset into another array, where all the decomposed
> sequences are. This makes the recomposition step somewhat slower, as it now
> has to grovel through an array that contains all decompositions, rather than
> just the array that contains decompositions of two characters. But this
> isn't performance-critical, readability and tight packing of the tables
> matter more.

Okay, no objection to that.

> * In the lookup table, store singleton decompositions that decompose to a
> single character, and the character fits in a uint16, directly in the main
> lookup table, instead of storing an index into the other table. This makes
> the tables somewhat smaller, as there are a lot of such decompositions.

Indeed.

> * Use uint32 instead of pg_wchar to represent unicode codepoints. pg_wchar
> suggests something that holds multi-byte characters in the OS locale, used
> by functions like wcscmp, so avoid that. I added a "typedef uint32
> Codepoint" alias, though, to make it more clear which variables hold Unicode
> codepoints rather than e.g. indexes into arrays.
>
> * Unicode consortium publishes a comprehensive normalization test suite, as
> a text file, NormalizationTest.txt. I wrote a small perl and C program to
> run all the tests from that test suite, with the normalization function.
> That uncovered a number of bugs in the recomposition algorithm, which I then
> fixed:

I was looking for something like that for some time, thanks! That's
here actually:
http://www.unicode.org/Public/8.0.0/ucd/NormalizationTest.txt

>  * decompositions that map to ascii characters cannot be excluded.
>  * non-canonical and non-starter decompositions need to be excluded. They
> are now flagged in the lookup table, so that we only use them during the
> decomposition phase, not when recomposing.

Okay on that.

> TODOs / Discussion points:
>
> * I'm not sure I like the way the code is organized, I feel that we're
> littering src/common with these. Perhaps we should add a
> src/common/unicode_normalization directory for all this.

pg_utf8_islegal() and pg_utf_mblen() should as well be moved in their
own file I think, and wchar.c can use that.

> * The list of characters excluded from recomposition is currently hard-coded
> in utf_norm_generate.pl. However, that list is available in machine-readable
> format, in file CompositionExclusions.txt. Since we're reading most of the
> data from UnicodeData.txt, would be good to read the exclusion table from a
> file, too.

Ouch. Those are present here...
http://www.unicode.org/reports/tr41/tr41-19.html#Exclusions
Definitely it makes more sense to read them from a file.

> * SASLPrep specifies normalization form KC, but it also specifies that some
> characters are mapped to space or nothing. Should do those mappings, too.

Ah, right. Those ones are here:
https://tools.ietf.org/html/rfc3454#appendix-B.1
The conversion tables need some extra tweaking...

+   else if ((*utf & 0xf0) == 0xe0)
+   {
+   if (utf[1] == 0 || utf[2] == 0)
+   goto invalid;
+   cp = (*utf++) & 0x0F;
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   }
+   else if ((*utf & 0xf8) == 0xf0)
+   {
+   if (utf[1] == 0 || utf[2] == 0|| utf[3] == 0)
+   goto invalid;
+   cp = (*utf++) & 0x07;
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   cp = (cp << 6) | ((*utf++) & 0x3F);
+   }
I find more readable something like pg_utf2wchar_with_len(), but well...

Some typos:
s/fore/for/
s/reprsented/represented/

You seem to be fully on it... I can help out with any of those items as needed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread Peter Eisentraut
On 4/4/17 12:55, Ashutosh Sharma wrote:
> As I am not seeing any response from Tomas for last 2-3 days and since
> the commit-fest is coming towards end, I have planned to work on the
> review comments that I had given few days back and submit the updated
> patch. PFA new version of patch that takes care of all the review
> comments given by me. I have also ran pgindent on btreefuncs.c file
> and have run some basic test cases. All looks fine to me now!

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-04 Thread Peter Eisentraut
On 4/4/17 22:47, Amit Kapila wrote:
>> Committed first part to allow internal representation change (only).
>>
>> No commitment yet to increasing wal-segsize in the way this patch has it.
>>
> 
> What part of patch you don't like and do you have any suggestions to
> improve the same?

I think there are still some questions and disagreements about how it
should behave.

I suggest the next step is to dial up the allowed segment size in
configure and run some tests about what a reasonable maximum value could
be.  I did a little bit of that, but somewhere around 256 MB, things got
really slow.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Pavan Deolasee
On Wed, Apr 5, 2017 at 8:42 AM, Robert Haas  wrote:

> On Tue, Apr 4, 2017 at 10:21 PM, Pavan Deolasee
>  wrote:
> > On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas 
> wrote:
> >>  but
> >> try to access the TOAST table would be fatal; that probably would have
> >> deadlock hazards among other problems.
> >
> > Hmm. I think you're right. We could make a copy of the heap tuple, drop
> the
> > lock and then access TOAST to handle that. Would that work?
>
> Yeah, but it might suck.  :-)


Well, better than causing a deadlock ;-)

Lets see if we want to go down the path of blocking WARM when tuples have
toasted attributes. I submitted a patch yesterday, but having slept over
it, I think I made mistakes there. It might not be enough to look at the
caller supplied new tuple because that may not have any toasted values, but
the final tuple that gets written to the heap may be toasted. We could look
at the new tuple's attributes to find if any indexed attributes are
toasted, but that might suck as well. Or we can simply block WARM if the
old or the new tuple has external attributes i.e. HeapTupleHasExternal()
returns true. That could be overly restrictive because irrespective of
whether the indexed attributes are toasted or just some other attribute is
toasted, we will block WARM on such updates. May be that's not a problem.

We will also need to handle the case where some older tuple in the chain
has toasted value and that tuple is presented to recheck (I think we can
handle that case fairly easily, but its not done in the code yet) because
of a subsequent WARM update and the tuples updated by WARM did not have any
toasted values (and hence allowed).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 10:21 PM, Pavan Deolasee
 wrote:
> On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas  wrote:
>>  but
>> try to access the TOAST table would be fatal; that probably would have
>> deadlock hazards among other problems.
>
> Hmm. I think you're right. We could make a copy of the heap tuple, drop the
> lock and then access TOAST to handle that. Would that work?

Yeah, but it might suck.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 10:22 AM, Ashutosh Bapat
 wrote:
> Yes, I agree. For an inner join, the partition key types need to "shrink"
> and for outer join they need to be "widened". I don't know if there is a way
> to know "wider" or "shorter" of two given types. We might have to implement
> a method to merge partition keys to produce partition key of the join, which
> may be different from either of the partition keys. So, after-all we may
> have to abandon the idea of canonical partition scheme. I haven't included
> this change in the attached set of patches.

I think this is why you need to regard the partitioning scheme as
something more like an equivalence class - possibly the partitioning
scheme should actually contain (or be?) an equivalence class.  Suppose
this is the query:

SELECT * FROM i4 INNER JOIN i8 ON i4.x = i8.x;

...where i4 (x) is an int4 partitioning key and i8 (x) is an int8
partitioning key.  It's meaningless to ask whether the result of the
join is partitioned by int4 or int8.  It's partitioned by the
equivalence class that contains both i4.x and i8.x.  If the result of
this join where joined to another table on either of those two
columns, a second partition-wise join would be theoretically possible.
If you insist on knowing the type of the partitioning scheme, rather
than just the opfamily, you've boxed yourself into a corner from which
there's no good escape.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Craig Ringer
On 5 April 2017 at 10:37, Tsunakawa, Takayuki
 wrote:

> Good point!  And I said earlier in this thread, I think managing privileges 
> (adding/revoking privileges from the user account) is the DBA's or sysadmin's 
> duty, and PG's removing all privileges feels overkill.

I think it's a sensible alternative to refusing to run as a highly
privileged role, which is what we used to do IIRC.

> OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock 
> Pages in Memory, using the attached pg_ctl.c.  Please see 
> EnableLockPagesPrivilege() and its call site.  But pg_ctl -w start fails 
> emitting the following message:

That won't work. You'd have to pass 0 to the flags of
CreateRestrictedToken and instead supply a PrivilegesToDelete array.
You'd probably GetTokenInformation and AND with a mask of ones you
wanted to retain.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-04-04 Thread Kyotaro HORIGUCHI
At Tue, 4 Apr 2017 20:19:39 +0200, Tomas Vondra  
wrote in <56f40b20-c464-fad2-ff39-06b668fac...@2ndquadrant.com>
> On 04/04/2017 09:55 AM, David Rowley wrote:
> > On 1 April 2017 at 04:25, David Rowley 
> > wrote:
> >> I've attached an updated patch.
> >
> > I've made another pass at this and ended up removing the tryextstats
> > variable. We now only try to use extended statistics when
> > clauselist_selectivity() is given a valid RelOptInfo with rtekind ==
> > RTE_RELATION, and of course, it must also have some extended stats
> > defined too.
> >
> > I've also cleaned up a few more comments, many of which I managed to
> > omit updating when I refactored how the selectivity estimates ties
> > into clauselist_selectivity()
> >
> > I'm quite happy with all of this now, and would also be happy for
> > other people to take a look and comment.
> >
> > As a reviewer, I'd be marking this ready for committer, but I've moved
> > a little way from just reviewing this now, having spent two weeks
> > hacking at it.
> >
> > The latest patch is attached.
> >
> 
> Thanks David, I agree the reworked patch is much cleaner that the last
> version I posted. Thanks for spending your time on it.
> 
> Two minor comments:
> 
> 1) DEPENDENCY_MIN_GROUP_SIZE
> 
> I'm not sure we still need the min_group_size, when evaluating
> dependencies. It was meant to deal with 'noisy' data, but I think it
> after switching to the 'degree' it might actually be a bad idea.
> 
> Consider this:
> 
> create table t (a int, b int);
> insert into t select 1, 1 from generate_series(1, 1) s(i);
> insert into t select i, i from generate_series(2, 2) s(i);
> create statistics s with (dependencies) on (a,b) from t;
> analyze t;
> 
> select stadependencies from pg_statistic_ext ;
>   stadependencies
> 
>  [{1 => 2 : 0.44}, {2 => 1 : 0.44}]
> (1 row)
> 
> So the degree of the dependency is just ~0.333 although it's obviously
> a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The
> reason is that we discard 2/3 of rows, because those groups are only a
> single row each, except for the one large group (1/3 of rows).
> 
> Without the mininum group size limitation, the dependencies are:
> 
> test=# select stadependencies from pg_statistic_ext ;
>   stadependencies
> 
>  [{1 => 2 : 1.00}, {2 => 1 : 1.00}]
> (1 row)
> 
> which seems way more reasonable, I think.

I think the same. Quite large part of functional dependency in
reality is in this kind.

> 2) A minor detail is that instead of this
> 
> if (estimatedclauses != NULL &&
> bms_is_member(listidx, estimatedclauses))
> continue;
> 
> perhaps we should do just this:
> 
> if (bms_is_member(listidx, estimatedclauses))
> continue;
> 
> bms_is_member does the same NULL check right at the beginning, so I
> don't think this might make a measurable difference.


I have some other comments.

==
- The comment for clauselist_selectivity,
| + * When 'rel' is not null and rtekind = RTE_RELATION, we'll try to apply
| + * selectivity estimates using any extended statistcs on 'rel'.

The 'rel' is actually a parameter but rtekind means rel->rtekind
so this might be better be such like the following.

| When a relation of RTE_RELATION is given as 'rel', we try
| extended statistcs on the relation.

Then the following line doesn't seem to be required.

| + * If we identify such extended statistics exist, we try to apply them.


=
The following comment in the same function,

| +if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL)
| +{
| +/*
| + * Try to estimate with multivariate functional dependency 
statistics.
| + *
| + * The function will supply an estimate for the clauses which it
| + * estimated for. Any clauses which were unsuitible were ignored.
| + * Clauses which were estimated will have their 0-based list index 
set
| + * in estimatedclauses.  We must ignore these clauses when processing
| + * the remaining clauses later.
| + */

(Notice that I'm not a good writer) This might better be the
following.

|  dependencies_clauselist_selectivity gives selectivity over
|  caluses that functional dependencies on the given relation is
|  applicable. 0-based index numbers of consumed clauses are
|  returned in the bitmap set estimatedclauses so that the
|  estimation here after can ignore them.

=
| +s1 *= dependencies_clauselist_selectivity(root, clauses, varRelid,
| +   jointype, sjinfo, rel, &estimatedclauses);

The name prefix "dependency_" means "functional_dependency" here
and omitting "functional" is confusing to me. On the other hand
"functional_dependency" is quite long as prefix. Could we use
"func_dependency" or something that is sho

Re: [HACKERS] identity columns

2017-04-04 Thread Vitaly Burovoy
On 4/3/17, Peter Eisentraut  wrote:
> On 3/30/17 22:57, Vitaly Burovoy wrote:
>> Why do you still want to leave "ADD IF NOT EXISTS" instead of using
>> "SET IF NOT EXISTS"?
>> If someone wants to follow the standard he can simply not to use "IF
>> NOT EXISTS" at all. Without it error should be raised.
>
> As I tried to mention earlier, it is very difficult to implement the IF
> NOT EXISTS behavior here, because we need to run the commands the create
> the sequence before we know whether we will need it.

I understand. You did not mention the reason.
But now you have the "get_attidentity" function to check whether the
column is an identity or not.
If it is not so create a sequence otherwise skip the creation.

I'm afraid after the feature freeze it is impossible to do "major
reworking of things", but after the release we have to support the
"ALTER COLUMN col ADD" grammar.

> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

The next nitpickings to the last patch. I try to get places with
lacking of variables' initialization.
All other things seem good for me now. I'll continue to review the
patch while you're fixing the current notes.
Unfortunately I don't know PG internals so deep to understand
correctness of changes in the executor (what Tom and Andres are
talking about).


0. There is a little conflict of applying to the current master
because of 60a0b2e.

1.
First of all, I think the previous usage of the constant
"ATTRIBUTE_IDENTITY_NONE" (for setting a value) is better than just
'\0'.
It is OK for lacking of the constant in comparison.

2.
Lack of "SET IDENTITY" and "RESTART" in the "Compatibility" chapter in
"alter_table.sgml":
"The forms ADD (without USING INDEX), DROP, SET DEFAULT, and SET DATA
TYPE (without USING) conform with the SQL standard."
Also ADD IDENTITY is an extension (I hope temporary), but it looks
like a standard feature (from the above sentance).

3.
It would be better if tab-completion has ability to complete the
"RESTART" keyword like:
ALTER TABLE x1 ALTER COLUMN i RESTART
tab-complete.c:8898
- COMPLETE_WITH_LIST5("TYPE", "SET", "RESET", "ADD", "DROP");
+ COMPLETE_WITH_LIST5("TYPE", "SET", "RESET", "RESTART", "ADD", "DROP");

4.
The block in "gram.y":
/* ALTER TABLE  ALTER [COLUMN]  DROP IDENTITY */
does not contain "n->missing_ok = false;". I think it should be.

5.
In the file "pg_dump.c" in the "getTableAttrs" function at row 7959
the "tbinfo->needs_override" is used (in the "||" operator) but it is
initially nowhere set to "false".

6.
And finally... a segfault when pre-9.6 databases are pg_dump-ing:
SQL query in the file "pg_dump.c" in funcion "getTables" has the
"is_identity_sequence" column only in the "if (fout->remoteVersion >=
90600)" block (frow 5536), whereas 9.6 does not support it (but OK,
for 9.6 it is always FALSE, no sense to create a new "if" block), but
in other blocks no ",FALSE as is_identity_sequence" is added.

The same mistake is in the "getTableAttrs" function. Moreover whether
"ORDER BY a.attrelid, a.attnum" is really necessary ("else if
(fout->remoteVersion >= 90200)" has only "ORDER BY a.attnum")?


-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-04-04 Thread Amit Kapila
On Wed, Apr 5, 2017 at 3:36 AM, Simon Riggs  wrote:
> On 27 March 2017 at 15:36, Beena Emerson  wrote:
>
>> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
>> the internal representation of max_wal_size and min_wal_size to mb.
>
> Committed first part to allow internal representation change (only).
>
> No commitment yet to increasing wal-segsize in the way this patch has it.
>

What part of patch you don't like and do you have any suggestions to
improve the same?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-04 Thread Craig Ringer
On 5 April 2017 at 08:23, Craig Ringer  wrote:
> On 5 April 2017 at 08:00, Craig Ringer  wrote:
>
>> Taking a look at this now.
>
> Rebased to current master with conflicts and whitespace errors fixed.
> Review pending.

This patch fails to update the documentation at all.

https://www.postgresql.org/docs/devel/static/spi.html



The patch crashes in initdb with --enable-cassert builds:

performing post-bootstrap initialization ... TRAP:
FailedAssertion("!(myState->pub.mydest == DestSQLFunction)", File:
"functions.c", Line: 800)
sh: line 1: 30777 Aborted (core dumped)
"/home/craig/projects/2Q/postgres/tmp_install/home/craig/pg/10/bin/postgres"
--single -F -O -j -c search_path=pg_catalog -c exit_on_error=true
template1 > /dev/null
child process exited with exit code 134

Backtrace attached.




Details on patch 1:

missing newline

+}
+int
+SPI_execute_callback(const char *src, bool read_only, long tcount,


+/* Execute a previously prepared plan with a callback Destination */


caps "Destination"



+// XXX throw error if callback is set

^^



+static DestReceiver spi_callbackDR = {
+donothingReceive, donothingStartup, donothingCleanup, donothingCleanup,
+DestSPICallback
+};


Is the callback destreceiver supposed to be a blackhole? Why? Its
name, spi_callbackDR and DestSPICallback, doesn't seem to indicate
that it drops its input.

Presumably that's a default destination you're then supposed to modify
with your own callbacks? There aren't any comments to indicate what's
going on here.



Comments on patch 2:

If this is the "bare minimum" patch, what is pending? Does it impose
any downsides or limits?

+/* Get switch execution contexts */
+extern PLyExecutionContext
*PLy_switch_execution_context(PLyExecutionContext *new);

Comment makes no sense to me. This seems something like memory context
switch, where you supply the new and return the old. But the comment
doesn't explain it at all.

+void PLy_CSStartup(DestReceiver *self, int operation, TupleDesc typeinfo);
+void PLy_CSDestroy(DestReceiver *self);

These are declared in the plpy_spi.c. Why aren't these static?

+volatile MemoryContext oldcontext;
+volatile ResourceOwner oldowner;
 int rv;
-volatile MemoryContext oldcontext;
-volatile ResourceOwner oldowner;

Unnecessary code movement.

In PLy_Callback_New, I think your use of a sub-context is sensible. Is
it necessary to palloc0 though?

+static CallbackState *
+PLy_Callback_Free(CallbackState *callback)

The code here looks like it could be part of spi.c's callback support,
rather than plpy_spi specific, since you provide a destroy callback in
the SPI callback struct.

+/* We need to store this because the TupleDesc the receive
function gets has no names. */
+myState->desc = typeinfo;

Is it safe to just store the pointer to the TupleDesc here? What's the lifetime?

+ * will clean it up when the time is right. XXX This might result in a leak
+ * if an error happens and the result doesn't get dereferenced.
+ */
+MemoryContextSwitchTo(TopMemoryContext);
+result->tupdesc = CreateTupleDescCopy(typeinfo);

especially given this XXX comment...


Patch needs bug fix, docs updates, fixes for issues marked in
comments. But overall approach looks sensible enough.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
#0  0x7f9d5d64ea28 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x7f9d5d65062a in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x0083a521 in ExceptionalCondition 
(conditionName=conditionName@entry=0x9b68d0 "!(myState->pub.mydest == 
DestSQLFunction)", errorType=errorType@entry=0x882d29 "FailedAssertion", 
fileName=fileName@entry=0x9b6920 "functions.c", 
lineNumber=lineNumber@entry=800) at assert.c:54
No locals.
#3  0x006166e8 in postquel_start (fcache=0x27772c0, es=0x284eb58) at 
functions.c:800
myState = 0x284eeb8
dest = 0x284eeb8
#4  fmgr_sql (fcinfo=0x27a7a28) at functions.c:1149
fcache = 0x27772c0
sqlerrcontext = {previous = 0x0, callback = 0x614910 
, arg = 0x27a79d0}
randomAccess = 0 '\000'
lazyEvalOK = 
is_first = 
pushed_snapshot = 0 '\000'
es = 0x284eb58
slot = 
result = 
eslist = 
eslc = 0x284eb90
__func__ = "fmgr_sql"
#5  0x0060810b in ExecInterpExpr (state=0x27a7528, econtext=0x27a4ce0, 
isnull=) at execExprInterp.c:670
fcinfo = 0x27a7a28
argnull = 0x27a7d68 ""
argno = 
op = 0x27a76f8
resultslot = 0x27a7130
innerslot = 
outerslot = 0x27a5930
scanslot = 0x0
dispatch_table = {0x608960 , 0x607e00 
, 0x607e20 , 0x607e38 
, 0x607c90 , 0x607cb0 
, 0x607cf8 , 0x607d14 
, 0x607d58 , 0x607d73 
, 0x607e50 , 0x607ea8 
, 0x607ee0 , 0x607f18 
, 0x607f30 , 0x607f80 
, 0x607fb8

Re: [HACKERS] Supporting huge pages on Windows

2017-04-04 Thread Tsunakawa, Takayuki
From: Craig Ringer [mailto:craig.rin...@2ndquadrant.com]
> TBH, anyone who cares about security and runs Win7 or Win2k8 or newer should
> be using virtual service accounts and managed service accounts.
> 
> https://technet.microsoft.com/en-us/library/dd548356
> 
> 
> Those are more like Unix service accounts. Notably they don't need a password,
> getting rid of some of the management pain that led us to abandon the
> 'postgres' system user on Windows.
> 
> Now that older platforms are EoL and even the oldest that added this feature
> are also near EoL or in extended maintenance, I think installers should
> switch to these by default instead of using NETWORK SERVICE.
> 
> Then the issue of priv dropping would be a lesser concern anyway.

Good point!  And I said earlier in this thread, I think managing privileges 
(adding/revoking privileges from the user account) is the DBA's or sysadmin's 
duty, and PG's removing all privileges feels overkill.

OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock Pages 
in Memory, using the attached pg_ctl.c.  Please see EnableLockPagesPrivilege() 
and its call site.  But pg_ctl -w start fails emitting the following message:

error code 1300
waiting for server to startFATAL:  could not enable "Lock pages in memory" 
privilege
HINT:  Assign "Lock pages in memory" privilege to the Windows user account 
which runs PostgreSQL.
LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

error code 1300 is ERROR_NOT_ALL_ASSIGNED, which means AdjustTokenPrivileges() 
cound not enable Lock Pages in Memory privilege.  It seems that the privilege 
cannot be enabled once it was removed with 
CreateRestrictedToken(DISABLE_MAX_PRIVILEGE).

Regards
Takayuki Tsunakawa


/*-
 *
 * pg_ctl --- start/stops/restarts the PostgreSQL server
 *
 * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
 *
 * src/bin/pg_ctl/pg_ctl.c
 *
 *-
 */

#ifdef WIN32
/*
 * Need this to get defines for restricted tokens and jobs. And it
 * has to be set before any header from the Win32 API is loaded.
 */
#define _WIN32_WINNT 0x0501
#endif

#include "postgres_fe.h"

#include "libpq-fe.h"
#include "pqexpbuffer.h"

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#ifdef HAVE_SYS_RESOURCE_H
#include 
#include 
#endif

#include "getopt_long.h"
#include "miscadmin.h"

/* PID can be negative for standalone backend */
typedef long pgpid_t;


typedef enum
{
SMART_MODE,
FAST_MODE,
IMMEDIATE_MODE
} ShutdownMode;


typedef enum
{
NO_COMMAND = 0,
INIT_COMMAND,
START_COMMAND,
STOP_COMMAND,
RESTART_COMMAND,
RELOAD_COMMAND,
STATUS_COMMAND,
PROMOTE_COMMAND,
KILL_COMMAND,
REGISTER_COMMAND,
UNREGISTER_COMMAND,
RUN_AS_SERVICE_COMMAND
} CtlCommand;

#define DEFAULT_WAIT60

static bool do_wait = false;
static bool del_service_rid = false;
static bool wait_set = false;
static int  wait_seconds = DEFAULT_WAIT;
static bool wait_seconds_arg = false;
static bool silent_mode = false;
static ShutdownMode shutdown_mode = FAST_MODE;
static int  sig = SIGINT;   /* default */
static CtlCommand ctl_command = NO_COMMAND;
static char *pg_data = NULL;
static char *pg_config = NULL;
static char *pgdata_opt = NULL;
static char *post_opts = NULL;
static const char *progname;
static char *log_file = NULL;
static char *exec_path = NULL;
static char *event_source = NULL;
static char *register_servicename = "PostgreSQL";   /* FIXME: + 
version ID? */
static char *register_username = NULL;
static char *register_password = NULL;
static char *argv0 = NULL;
static bool allow_core_files = false;
static time_t start_time;

static char postopts_file[MAXPGPATH];
static char version_file[MAXPGPATH];
static char pid_file[MAXPGPATH];
static char backup_file[MAXPGPATH];
static char recovery_file[MAXPGPATH];
static char promote_file[MAXPGPATH];

#ifdef WIN32
static DWORD pgctl_start_type = SERVICE_AUTO_START;
static SERVICE_STATUS status;
static SERVICE_STATUS_HANDLE hStatus = (SERVICE_STATUS_HANDLE) 0;
static HANDLE shutdownHandles[2];
static pid_t postmasterPID = -1;

#define shutdownEvent shutdownHandles[0]
#define postmasterProcess shutdownHandles[1]
#endif


static void write_stderr(const char *fmt,...) pg_attribute_printf(1, 2);
static void do_advice(void);
static void do_help(void);
static void set_mode(char *modeopt);
static void set_sig(char *signame);
static void do_init(void);
static void do_start(void);
static void do_stop(void);
static void do_restart(void);
static void do_reload(void);
static void do_status(void);
static void do_promote(void);
static void do_kill(pgpid_t pid);
static void print_msg(const char *msg);
static void adjust_data

Re: [HACKERS] Parallel Append implementation

2017-04-04 Thread Ashutosh Bapat
On Wed, Apr 5, 2017 at 1:43 AM, Andres Freund  wrote:

> On 2017-04-04 08:01:32 -0400, Robert Haas wrote:
> > On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund 
> wrote:
> > > I don't think the parallel seqscan is comparable in complexity with the
> > > parallel append case.  Each worker there does the same kind of work,
> and
> > > if one of them is behind, it'll just do less.  But correct sizing will
> > > be more important with parallel-append, because with non-partial
> > > subplans the work is absolutely *not* uniform.
> >
> > Sure, that's a problem, but I think it's still absolutely necessary to
> > ramp up the maximum "effort" (in terms of number of workers)
> > logarithmically.  If you just do it by costing, the winning number of
> > workers will always be the largest number that we think we'll be able
> > to put to use - e.g. with 100 branches of relatively equal cost we'll
> > pick 100 workers.  That's not remotely sane.
>
> I'm quite unconvinced that just throwing a log() in there is the best
> way to combat that.  Modeling the issue of starting more workers through
> tuple transfer, locking, startup overhead costing seems a better to me.
>
> If the goal is to compute the results of the query as fast as possible,
> and to not use more than max_parallel_per_XXX, and it's actually
> beneficial to use more workers, then we should.  Because otherwise you
> really can't use the resources available.
>

+1. I had expressed similar opinion earlier, but yours is better
articulated. Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-04-04 Thread Michael Paquier
On Tue, Apr 4, 2017 at 10:52 PM, Peter Eisentraut
 wrote:
> On 4/3/17 11:32, Andres Freund wrote:
>> That doesn't strike as particularly future proof.  We intentionally
>> leave objects behind pg_regress runs, but that only works if we actually
>> run them...
>
> I generally agree with the sentiments expressed later in this thread.
> But just to clarify what I meant here:  We don't need to run a, say,
> 1-minute serial test to load a few "left behind" objects for the
> pg_upgrade test, if we can load the same set of objects using dedicated
> scripting in say 2 seconds.  This would make both the pg_upgrade tests
> faster and would reduce the hidden dependencies in the main tests about
> which kinds of objects need to be left behind.

Making the tests run shorter while maintaining the current code
coverage is nice. But this makes more complicated the test suite
maintenance as this needs either a dedicated regression schedule or an
extra test suite where objects are created just for the sake of
pg_upgrade. This increases the risks of getting a rotten test suite
with the time if patch makers and reviewers are not careful.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-04-04 Thread Michael Paquier
On Tue, Apr 4, 2017 at 9:30 PM, Stephen Frost  wrote:
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Tue, Apr 4, 2017 at 7:38 AM, Stephen Frost  wrote:
>> The patch presented here does lower the coverage we have now.
>
> I assume (perhaps mistakenly) that this statement was based on an
> analysis of before-and-after 'make coverage' runs.  Here you are saying
> that you're *not* lowering the coverage.

My apologies here, when I used the work "coverage" in my previous
emails it visibly implied that I meant that I had used the coverage
stuff. But I did not because the TAP test proposed does exactly what
test.sh is doing:
1) Initialize the old cluster and start it.
2) create a bunch of databases with full range of ascii characters.
3) Run regression tests.
4) Take dump on old cluster.
4) Stop the old cluster.
5) Initialize the new cluster.
6) Run pg_upgrade.
7) Start new cluster.
8) Take dump from it.
9) Run deletion script (Oops forgot this part!)

> I understand how the current pg_upgrade tests work.  I don't see
> off-hand why the TAP tests would reduce the code coverage of pg_upgrade,
> but if they do, we should be able to figure out why and correct it.

Good news is that this patch at least does not lower the bar.

>> So in short I don't think that this lack of infrastructure should be a
>> barrier for what is basically a cleanup but... I just work here.
>
> I didn't mean to imply that this patch needs to address the
> cross-version testing challenge, was merely mentioning that there's been
> some work in this area already by Andrew and that if you're interested
> in working on that problem that you should probably coordinate with him.

Sure.

> What I do think is a barrier to this patch moving forward is if it
> reduces our current code coverage testing (with the same-version
> pg_upgrade that's run in the regular regression tests).  If it doesn't,
> then great, but if it does, then the patch should be updated to fix
> that.

I did not do a coverage test first, but surely this patch needs
numbers, so here you go.

Without the patch, here is the coverage of src/bin/pg_upgrade:
  lines..: 57.7% (1311 of 2273 lines)
  functions..: 85.3% (87 of 102 functions)

And with the patch:
  lines..: 58.8% (1349 of 2294 lines)
  functions..: 85.6% (89 of 104 functions)
The coverage gets a bit higher as a couple of basic code paths like
pg_upgrade --help get looked at.
-- 
Michael


pgupgrade-tap-test-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-04 Thread Pavan Deolasee
On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas  wrote:

>  but
> try to access the TOAST table would be fatal; that probably would have
> deadlock hazards among other problems.


Hmm. I think you're right. We could make a copy of the heap tuple, drop the
lock and then access TOAST to handle that. Would that work?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tatsuo Ishii
> Hmm. IMO, that could happen even with the current statement timeout
> implementation as well.
> 
> Or we could start/stop the timeout in exec_execute_message() only. This
> could avoid the problem above. Also this is more consistent with
> log_duration/log_min_duration_statement behavior than now.

I think it's better to include Parse and Bind as now.  Parse or Bind could take 
long time when the table has many partitions, the query is complex and/or very 
long, some DDL statement is running against a target table, or the system load 
is high.

Firing statement timeout during or after many Parses is not a problem, because 
the first Parse started running some statement and it's not finished.  Plus, 
Andres confirmed that major client drivers don't use such a pattern.  There may 
be an odd behavior purely from the perspective of E.Q.P, that's a compromise, 
which Andres meant by "not perfect but."

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
> On 2017-04-04 16:56:26 -0700, 'Andres Freund' wrote:
>> On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote:
>> > From: Andres Freund [mailto:and...@anarazel.de]
>> > > Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs 
>> > > and
>> > > npgsql doing it seems like some evidence that it's ok.
>> > 
>> > And psqlODBC now uses always libpq.
>> > 
>> > Now time for final decision?
>> 
>> I'll send an updated patch in a bit, and then will wait till tomorrow to
>> push. Giving others a chance to chime in seems fair.
> 
> Attached.  I did not like that the previous patch had the timeout
> handling duplicated in the individual functions, I instead centralized
> it into start_xact_command().  Given that it already activated the
> timeout in the most common cases, that seems to make more sense to
> me. In your version we'd have called enable_statement_timeout() twice
> consecutively (which behaviourally is harmless).
> 
> What do you think?  I've not really tested this with the extended
> protocol, so I'd appreciate if you could rerun your test from the
> older thread.

Ok, I will look into your patch and test it out using pgproto.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread 'Andres Freund'
On 2017-04-04 16:56:26 -0700, 'Andres Freund' wrote:
> On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote:
> > From: Andres Freund [mailto:and...@anarazel.de]
> > > Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs and
> > > npgsql doing it seems like some evidence that it's ok.
> > 
> > And psqlODBC now uses always libpq.
> > 
> > Now time for final decision?
> 
> I'll send an updated patch in a bit, and then will wait till tomorrow to
> push. Giving others a chance to chime in seems fair.

Attached.  I did not like that the previous patch had the timeout
handling duplicated in the individual functions, I instead centralized
it into start_xact_command().  Given that it already activated the
timeout in the most common cases, that seems to make more sense to
me. In your version we'd have called enable_statement_timeout() twice
consecutively (which behaviourally is harmless).

What do you think?  I've not really tested this with the extended
protocol, so I'd appreciate if you could rerun your test from the
older thread.

- Andres
>From c47d904725f3ef0272a4540b6b5bcf0be5c1dea6 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 4 Apr 2017 18:44:42 -0700
Subject: [PATCH] Rearm statement_timeout after each executed query.

Previously statement_timeout, in the extended protocol, affected all
messages till a Sync message.  For clients that pipeline/batch
query execution that's problematic.

Instead disable timeout after each Execute message, and enable, if
necessary, the timer in start_xact_command().

Author: Tatsuo Ishii, editorialized by Andres Freund
Reviewed-By: Takayuki Tsunakawa, Andres Freund
Discussion: https://postgr.es/m/20170222.115044.1665674502985097185.t-is...@sraoss.co.jp
---
 src/backend/tcop/postgres.c | 77 ++---
 1 file changed, 65 insertions(+), 12 deletions(-)

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2282058c0..4ab3bd7f07 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -149,6 +149,11 @@ static bool doing_extended_query_message = false;
 static bool ignore_till_sync = false;
 
 /*
+ * Flag to keep track of whether statement timeout timer is active.
+ */
+static bool stmt_timeout_active = false;
+
+/*
  * If an unnamed prepared statement exists, it's stored here.
  * We keep it separate from the hashtable kept by commands/prepare.c
  * in order to reduce overhead for short-lived queries.
@@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *pstmts);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static void enable_statement_timeout(void);
+static void disable_statement_timeout(void);
 
 
 /* 
@@ -1234,7 +1241,8 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	/*
 	 * Start up a transaction command so we can run parse analysis etc. (Note
 	 * that this will normally change current memory context.) Nothing happens
-	 * if we are already in one.
+	 * if we are already in one.  This also arms the statement timeout if
+	 * necessary.
 	 */
 	start_xact_command();
 
@@ -1522,7 +1530,8 @@ exec_bind_message(StringInfo input_message)
 	/*
 	 * Start up a transaction command so we can call functions etc. (Note that
 	 * this will normally change current memory context.) Nothing happens if
-	 * we are already in one.
+	 * we are already in one.  This also arms the statement timeout if
+	 * necessary.
 	 */
 	start_xact_command();
 
@@ -2014,6 +2023,9 @@ exec_execute_message(const char *portal_name, long max_rows)
 			 * those that start or end a transaction block.
 			 */
 			CommandCounterIncrement();
+
+			/* full command has been executed, reset timeout */
+			disable_statement_timeout();
 		}
 
 		/* Send appropriate CommandComplete to client */
@@ -2443,25 +2455,27 @@ start_xact_command(void)
 	{
 		StartTransactionCommand();
 
-		/* Set statement timeout running, if any */
-		/* NB: this mustn't be enabled until we are within an xact */
-		if (StatementTimeout > 0)
-			enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
-		else
-			disable_timeout(STATEMENT_TIMEOUT, false);
-
 		xact_started = true;
 	}
+
+	/*
+	 * Start statement timeout if necessary.  Note that this'll intentionally
+	 * not reset the clock on an already started timeout, to avoid the timing
+	 * overhead when start_xact_command() is invoked repeatedly, without an
+	 * interceding finish_xact_command() (e.g. parse/bind/execute).  If that's
+	 * not desired, the timeout has to be disabled explicitly.
+	 */
+	enable_statement_timeout();
 }
 
 static void
 finish_xact_command(void)
 {
+	/* cancel active statement timeout after each command */
+	disable_statement_timeout();
+
 	if (xact_started)
 	{
-		/* Cancel any active statement timeout before committing */
-		disable_timeout(STATEMENT_TIMEOUT, false);
-
 		Co

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 6:56 PM, Joe Conway  wrote:
> On 04/04/2017 10:02 AM, Joe Conway wrote:
>> On 04/04/2017 09:55 AM, Mike Palmiotto wrote:
>>> After some discussion off-list, I've rebased and udpated the patches.
>>> Please see attached for further review.
>>
>> Thanks -- will have another look and test on a machine with selinux
>> setup. Robert, did you want me to take responsibility to commit on this
>> or just provide review/feedback?
>
> I did some editorializing on these.
>
> In particular I did not like the approach to fixing "warning: ‘tclass’
> may be used uninitialized" and ended up just doing it the same as was
> done elsewhere in relation.c already (set tclass = 0 in the variable
> declaration). Along the way I also changed one instance of tclass from
> uint16 to uint16_t for the sake of consistency.
>
> Interestingly we figured out that the warning was present with -Og, but
> not present with -O0, -O2, or -O3.
>
> If you want to test, apply 0001a and 0001b before 0002.
>
> Any objections?

I'm guessing Tom's going to have a strong feeling about whether 0001a
is the right way to address the stdbool issue, but I don't personally
know what the right thing to do is so I will avoid opining on that
topic.

So, nope, no objections here to you committing those.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
>> What's your point of the question? What kind of problem do you expect
>> if the timeout starts only once at the first parse meesage out of
>> bunch of parse messages?
> 
> It's perfectly valid to send a lot of Parse messages without
> interspersed Sync or Bind/Execute message.  There'll be one timeout
> covering all of those Parse messages, which can thus lead to a timeout,
> even though nothing actually takes long individually.

Hmm. IMO, that could happen even with the current statement timeout
implementation as well.

Or we could start/stop the timeout in exec_execute_message()
only. This could avoid the problem above. Also this is more consistent
with log_duration/log_min_duration_statement behavior than now.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Masahiko Sawada
On Wed, Apr 5, 2017 at 1:49 AM, Robert Haas  wrote:
> On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote
>  wrote:
>> Hmm, you're right.  It could be counted with a separate variable
>> initialized to 0 and incremented every time we decide to add a row to the
>> final set of sampled rows, although different implementations of
>> AcquireSampleRowsFunc have different ways of deciding if a given row will
>> be part of the final set of sampled rows.
>>
>> On the other hand, if we decide to count progress in terms of blocks as
>> you suggested afraid, I'm afraid that FDWs won't be able to report the
>> progress.
>
> I think it may be time to push this patch out to v11.  It was
> submitted one day before the start of the last CommitFest, the design
> wasn't really right, and it's not clear even now that we know what the
> right design is.  And we're pretty much out of time.
>

+1
We're encountering the design issue and it takes more time to find out
right design including FDWs support.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-05 10:05:19 +0900, Tatsuo Ishii wrote:
> > Hm. I started to edit it, but I'm halfway coming back to my previous
> > view that this isn't necessarily ready.
> > 
> > If a client were to to prepare a large number of prepared statements
> > (i.e. a lot of parse messages), this'll only start the timeout once, at
> > the first statement sent.  It's not an issue for libpq which sends a
> > sync message after each PQprepare, nor does it look to be an issue for
> > pgjdbc based on a quick look.
> > 
> > Does anybody think there might be a driver out there that sends a bunch
> > of 'parse' messages, without syncs?
> 
> What's your point of the question? What kind of problem do you expect
> if the timeout starts only once at the first parse meesage out of
> bunch of parse messages?

It's perfectly valid to send a lot of Parse messages without
interspersed Sync or Bind/Execute message.  There'll be one timeout
covering all of those Parse messages, which can thus lead to a timeout,
even though nothing actually takes long individually.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Outdated comments around HandleFunctionRequest

2017-04-04 Thread Andres Freund
Hi,

PostgresMain() has the following blurb for fastpath functions:

/*
 * Note: we may at this point be inside an 
aborted
 * transaction.  We can't throw error for that 
until we've
 * finished reading the function-call message, 
so
 * HandleFunctionRequest() must check for it 
after doing so.
 * Be careful not to do anything that assumes 
we're inside a
 * valid transaction here.
 */
and in HandleFunctionRequest() there's:

 * INPUT:
 *  In protocol version 3, postgres.c has already read the message 
body
 *  and will pass it in msgBuf.
 *  In old protocol, the passed msgBuf is empty and we must read the
 *  message here.

which is not true anymore.  Followed by:

/*
 * Now that we've eaten the input message, check to see if we actually
 * want to do the function call or not.  It's now safe to ereport(); we
 * won't lose sync with the frontend.
 */

which is also not really meaningful, because there's no previous code in
the function.

This largely seems to be damage from


commit 2b3a8b20c2da9f39ffecae25ab7c66974fbc0d3b
Author: Heikki Linnakangas 
Date:   2015-02-02 17:08:45 +0200

Be more careful to not lose sync in the FE/BE protocol.

Heikki?


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
> Hm. I started to edit it, but I'm halfway coming back to my previous
> view that this isn't necessarily ready.
> 
> If a client were to to prepare a large number of prepared statements
> (i.e. a lot of parse messages), this'll only start the timeout once, at
> the first statement sent.  It's not an issue for libpq which sends a
> sync message after each PQprepare, nor does it look to be an issue for
> pgjdbc based on a quick look.
> 
> Does anybody think there might be a driver out there that sends a bunch
> of 'parse' messages, without syncs?

What's your point of the question? What kind of problem do you expect
if the timeout starts only once at the first parse meesage out of
bunch of parse messages?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench - allow to store select results into variables

2017-04-04 Thread Tatsuo Ishii
>> Please find attached a v8 which hopefully fixes these two issues.
> Looks good to me, marking as ready for committer.

I have looked into this a little bit.

It seems the new feature \gset doesn't work with tables having none
ascii column names:

$ src/bin/pgbench/pgbench -t 1 -f /tmp/f test
starting vacuum...end.
gset: invalid variable name: "数字"
client 0 file 0 command 0 compound 0: error storing into var 数字
transaction type: /tmp/f
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 1
number of transactions actually processed: 0/1

This is because pgbench variable names are limited to ascii
ranges. IMO the limitation is unnecessary and should be removed. (I
know that the limitation was brought in by someone long time ago and
the patch author is not responsible for that).

Anyway, now that the feature reveals the undocumented limitation, we
should document the limitation of \gset at least.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-04-04 Thread Michael Paquier
On Wed, Apr 5, 2017 at 12:39 AM, Stephen Frost  wrote:
> Committed, with those additions.

Thanks for the commit. The final result is nice.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-04 Thread Craig Ringer
On 5 April 2017 at 08:00, Craig Ringer  wrote:

> Taking a look at this now.

Rebased to current master with conflicts and whitespace errors fixed.
Review pending.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 71b6163734934db3160de81fd064e7d113b9122f Mon Sep 17 00:00:00 2001
From: Jim Nasby 
Date: Wed, 1 Mar 2017 15:45:51 -0600
Subject: [PATCH 1/2] Add SPI_execute_callback() and callback-based
 DestReceiver.

Instead of placing results in a tuplestore, this method of execution
uses the supplied callback when creating the Portal for a query.
---
 src/backend/executor/spi.c | 79 --
 src/backend/tcop/dest.c| 11 +++
 src/include/executor/spi.h |  4 +++
 src/include/tcop/dest.h|  1 +
 4 files changed, 85 insertions(+), 10 deletions(-)

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3a89ccd..a0af31a 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -55,7 +55,8 @@ static void _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan);
 
 static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
   Snapshot snapshot, Snapshot crosscheck_snapshot,
-  bool read_only, bool fire_triggers, uint64 tcount);
+  bool read_only, bool fire_triggers, uint64 tcount,
+  DestReceiver *callback);
 
 static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
 	Datum *Values, const char *Nulls);
@@ -321,7 +322,34 @@ SPI_execute(const char *src, bool read_only, long tcount)
 
 	res = _SPI_execute_plan(&plan, NULL,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount);
+			read_only, true, tcount, NULL);
+
+	_SPI_end_call(true);
+	return res;
+}
+int
+SPI_execute_callback(const char *src, bool read_only, long tcount,
+		DestReceiver *callback)
+{
+	_SPI_plan	plan;
+	int			res;
+
+	if (src == NULL || tcount < 0)
+		return SPI_ERROR_ARGUMENT;
+
+	res = _SPI_begin_call(true);
+	if (res < 0)
+		return res;
+
+	memset(&plan, 0, sizeof(_SPI_plan));
+	plan.magic = _SPI_PLAN_MAGIC;
+	plan.cursor_options = 0;
+
+	_SPI_prepare_oneshot_plan(src, &plan);
+
+	res = _SPI_execute_plan(&plan, NULL,
+			InvalidSnapshot, InvalidSnapshot,
+			read_only, true, tcount, callback);
 
 	_SPI_end_call(true);
 	return res;
@@ -355,7 +383,34 @@ SPI_execute_plan(SPIPlanPtr plan, Datum *Values, const char *Nulls,
 			_SPI_convert_params(plan->nargs, plan->argtypes,
 Values, Nulls),
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount);
+			read_only, true, tcount, NULL);
+
+	_SPI_end_call(true);
+	return res;
+}
+
+/* Execute a previously prepared plan with a callback Destination */
+int
+SPI_execute_plan_callback(SPIPlanPtr plan, Datum *Values, const char *Nulls,
+ bool read_only, long tcount, DestReceiver *callback)
+{
+	int			res;
+
+	if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC || tcount < 0)
+		return SPI_ERROR_ARGUMENT;
+
+	if (plan->nargs > 0 && Values == NULL)
+		return SPI_ERROR_PARAM;
+
+	res = _SPI_begin_call(true);
+	if (res < 0)
+		return res;
+
+	res = _SPI_execute_plan(plan,
+			_SPI_convert_params(plan->nargs, plan->argtypes,
+Values, Nulls),
+			InvalidSnapshot, InvalidSnapshot,
+			read_only, true, tcount, callback);
 
 	_SPI_end_call(true);
 	return res;
@@ -384,7 +439,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params,
 
 	res = _SPI_execute_plan(plan, params,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount);
+			read_only, true, tcount, NULL);
 
 	_SPI_end_call(true);
 	return res;
@@ -425,7 +480,7 @@ SPI_execute_snapshot(SPIPlanPtr plan,
 			_SPI_convert_params(plan->nargs, plan->argtypes,
 Values, Nulls),
 			snapshot, crosscheck_snapshot,
-			read_only, fire_triggers, tcount);
+			read_only, fire_triggers, tcount, NULL);
 
 	_SPI_end_call(true);
 	return res;
@@ -472,7 +527,7 @@ SPI_execute_with_args(const char *src,
 
 	res = _SPI_execute_plan(&plan, paramLI,
 			InvalidSnapshot, InvalidSnapshot,
-			read_only, true, tcount);
+			read_only, true, tcount, NULL);
 
 	_SPI_end_call(true);
 	return res;
@@ -1907,7 +1962,8 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
 static int
 _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
   Snapshot snapshot, Snapshot crosscheck_snapshot,
-  bool read_only, bool fire_triggers, uint64 tcount)
+  bool read_only, bool fire_triggers, uint64 tcount,
+  DestReceiver *callback)
 {
 	int			my_res = 0;
 	uint64		my_processed = 0;
@@ -1918,6 +1974,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
 	ErrorContextCallback spierrcontext;
 	CachedPlan *cplan = NULL;
 	ListCell   *lc1;
+	DestReceiver *dest = callback;
 
 	/*
 	 * Setup error traceback support for ereport()
@@ -2037,7 +2094,6 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamLi

Re: [HACKERS] Speedup twophase transactions

2017-04-04 Thread Michael Paquier
On Tue, Mar 28, 2017 at 3:10 PM, Michael Paquier
 wrote:
> OK, done. I have just noticed that Simon has marked himself as a
> committer of this patch 24 hours ago.

For the archive's sake, this has been committed as 728bd991. Thanks Simon!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-04-04 Thread Thomas Munro
On Wed, Apr 5, 2017 at 11:49 AM, Kevin Grittner  wrote:
> Worked on the docs some more and then pushed it.
>
> Nice job cutting the number of *.[ch] lines by 30 while adding support for
> the other three core PLs.  :-)

Great.  Thanks.  I wonder if there is some way we can automatically
include code fragments in the documentation without keeping them in
sync manually.

Now it looks like I have no more excuses for putting off reading the
papers you've shared[1][2] about incremental matview algorithms!
Looking forward to helping out with the next steps in that project if
I can.

[1] 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254&rep=rep1&type=pdf
[2] 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.3208&rep=rep1&type=pdf

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster methods for getting SPI results (460% improvement)

2017-04-04 Thread Craig Ringer
On 6 March 2017 at 05:09, Jim Nasby  wrote:
> On 2/28/17 9:42 PM, Jim Nasby wrote:
>>>
>>>
>>> I'll post a plpython patch that doesn't add the output format control.
>>
>>
>> I've attached the results of that. Unfortunately the speed improvement
>> is only 27% at this point (with 999 tuples). Presumably that's
>> because it's constructing a brand new dictionary from scratch for each
>> tuple.

Taking a look at this now.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread 'Andres Freund'
On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote:
> From: Andres Freund [mailto:and...@anarazel.de]
> > Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs and
> > npgsql doing it seems like some evidence that it's ok.
> 
> And psqlODBC now uses always libpq.
> 
> Now time for final decision?

I'll send an updated patch in a bit, and then will wait till tomorrow to
push. Giving others a chance to chime in seems fair.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tsunakawa, Takayuki
From: Andres Freund [mailto:and...@anarazel.de]
> Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs and
> npgsql doing it seems like some evidence that it's ok.

And psqlODBC now uses always libpq.

Now time for final decision?

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-04-04 Thread Kevin Grittner
On Mon, Apr 3, 2017 at 7:16 PM, Thomas Munro
 wrote:
> On Tue, Apr 4, 2017 at 3:41 AM, Kevin Grittner  wrote:
>> On Mon, Apr 3, 2017 at 8:59 AM, Tom Lane  wrote:
>>> Thomas Munro  writes:
 Or perhaps the code to inject trigger data transition tables into SPI
 (a near identical code block these three patches) should be somewhere
 common so that each PLs would only need to call a function.  If so,
 where should that go?
>>>
>>> spi.c?
>>
>> Until now, trigger.c didn't know about SPI, and spi.c didn't know
>> about triggers.  The intersection was left to referencing code, like
>> PLs.  Is there any other common code among the PLs dealing with this
>> intersection?  If so, maybe a new triggerspi.c file (or
>> spitrigger.c?) would make sense.  Possibly it could make sense from
>> a code structure PoV even for a single function, but it seems kinda
>> iffy for just this function.  As far as I can see it comes down to
>> adding it to spi.c or creating a new file -- or just duplicating
>> these 30-some lines of code to every PL.
>
> Ok, how about SPI_register_trigger_data(TriggerData *)?  Or any name
> you prefer...  I didn't suggest something as specific as
> SPI_register_transition_tables because think it's plausible that
> someone might want to implement SQL standard REFERENCING OLD/NEW ROW
> AS and make that work in all PLs too one day, and this would be the
> place to do that.
>
> See attached, which add adds the call to all four built-in PLs.  Thoughts?

Worked on the docs some more and then pushed it.

Nice job cutting the number of *.[ch] lines by 30 while adding support for
the other three core PLs.  :-)

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-04 16:38:53 -0700, Andres Freund wrote:
> On 2017-04-04 16:10:32 +0900, Tatsuo Ishii wrote:
> > >> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute 
> > >> starts and stops the timer), then it's a concern and the patch should 
> > >> not be ready for committer.  However, the current patch is not like that 
> > >> -- it seems to do what others in this thread are expecting.
> > > 
> > > Oh, interesting - I kind of took the author's statement as, uh,
> > > authoritative ;).  A quick look over the patch confirms your
> > > understanding.
> > 
> > Yes, Tsunakawa-san is correct. Sorry for confusion.
> > 
> > > I think the code needs a few clarifying comments around this, but
> > > otherwise seems good.  Not restarting the timeout in those cases
> > > obviously isn't entirely "perfect"/"correct", but a tradeoff - the
> > > comments should note that.
> > > 
> > > Tatsuo-san, do you want to change those, and push?  I can otherwise.
> > 
> > Andres,
> > 
> > If you don't mind, could you please fix the comments and push it.
> 
> Hm. I started to edit it, but I'm halfway coming back to my previous
> view that this isn't necessarily ready.
> 
> If a client were to to prepare a large number of prepared statements
> (i.e. a lot of parse messages), this'll only start the timeout once, at
> the first statement sent.  It's not an issue for libpq which sends a
> sync message after each PQprepare, nor does it look to be an issue for
> pgjdbc based on a quick look.
> 
> Does anybody think there might be a driver out there that sends a bunch
> of 'parse' messages, without syncs?

Looks to me like npgsql doesn't do that either.  None of libpq, pgjdbs
and npgsql doing it seems like some evidence that it's ok.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-05 08:34:43 +0900, Tatsuo Ishii wrote:
> Andres,
> 
> >> I think the code needs a few clarifying comments around this, but
> >> otherwise seems good.  Not restarting the timeout in those cases
> >> obviously isn't entirely "perfect"/"correct", but a tradeoff - the
> >> comments should note that.
> >> 
> >> Tatsuo-san, do you want to change those, and push?  I can otherwise.
> > 
> > Andres,
> > 
> > If you don't mind, could you please fix the comments and push it.
> 
> I have changed the comments as you suggested. If it's ok, I can push
> the patch myself (today I have time to work on this).

I'm working on the patch, and I've edited it more heavily, so please
hold off.

Changes:
I don't think the debugging statements are a good idea, the
!xact_started should be an assert, and disable_timeout should be called
from within enable_statement_timeout independent of stmt_timer_started.


But more importantly I had just sent a question that I think merits
discussion.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Andres Freund
On 2017-04-04 16:10:32 +0900, Tatsuo Ishii wrote:
> >> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute 
> >> starts and stops the timer), then it's a concern and the patch should not 
> >> be ready for committer.  However, the current patch is not like that -- it 
> >> seems to do what others in this thread are expecting.
> > 
> > Oh, interesting - I kind of took the author's statement as, uh,
> > authoritative ;).  A quick look over the patch confirms your
> > understanding.
> 
> Yes, Tsunakawa-san is correct. Sorry for confusion.
> 
> > I think the code needs a few clarifying comments around this, but
> > otherwise seems good.  Not restarting the timeout in those cases
> > obviously isn't entirely "perfect"/"correct", but a tradeoff - the
> > comments should note that.
> > 
> > Tatsuo-san, do you want to change those, and push?  I can otherwise.
> 
> Andres,
> 
> If you don't mind, could you please fix the comments and push it.

Hm. I started to edit it, but I'm halfway coming back to my previous
view that this isn't necessarily ready.

If a client were to to prepare a large number of prepared statements
(i.e. a lot of parse messages), this'll only start the timeout once, at
the first statement sent.  It's not an issue for libpq which sends a
sync message after each PQprepare, nor does it look to be an issue for
pgjdbc based on a quick look.

Does anybody think there might be a driver out there that sends a bunch
of 'parse' messages, without syncs?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement timeout behavior in extended queries

2017-04-04 Thread Tatsuo Ishii
Andres,

>> I think the code needs a few clarifying comments around this, but
>> otherwise seems good.  Not restarting the timeout in those cases
>> obviously isn't entirely "perfect"/"correct", but a tradeoff - the
>> comments should note that.
>> 
>> Tatsuo-san, do you want to change those, and push?  I can otherwise.
> 
> Andres,
> 
> If you don't mind, could you please fix the comments and push it.

I have changed the comments as you suggested. If it's ok, I can push
the patch myself (today I have time to work on this).

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2282058..1fd93359 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -149,6 +149,11 @@ static bool doing_extended_query_message = false;
 static bool ignore_till_sync = false;
 
 /*
+ * Flag to keep track of whether we have started statement timeout timer.
+ */
+static bool stmt_timer_started = false;
+
+/*
  * If an unnamed prepared statement exists, it's stored here.
  * We keep it separate from the hashtable kept by commands/prepare.c
  * in order to reduce overhead for short-lived queries.
@@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *pstmts);
 static void drop_unnamed_stmt(void);
 static void SigHupHandler(SIGNAL_ARGS);
 static void log_disconnections(int code, Datum arg);
+static void enable_statement_timeout(void);
+static void disable_statement_timeout(void);
 
 
 /* 
@@ -1239,6 +1246,15 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	start_xact_command();
 
 	/*
+	 * Set statement timeout running if it's not already set. The timer will
+	 * be reset in a subsequent execute message. Ideally the timer should be
+	 * reset in this function so that each parse/bind/execute message catches
+	 * the timeout, but it needs gettimeofday() call for each, which is too
+	 * expensive.
+	 */
+	enable_statement_timeout();
+
+	/*
 	 * Switch to appropriate context for constructing parsetrees.
 	 *
 	 * We have two strategies depending on whether the prepared statement is
@@ -1526,6 +1542,15 @@ exec_bind_message(StringInfo input_message)
 	 */
 	start_xact_command();
 
+	/*
+	 * Set statement timeout running if it's not already set. The timer will
+	 * be reset in a subsequent execute message. Ideally the timer should be
+	 * reset in this function so that each parse/bind/execute message catches
+	 * the timeout, but it needs gettimeofday() call for each, which is too
+	 * expensive.
+	 */
+	enable_statement_timeout();
+
 	/* Switch back to message context */
 	MemoryContextSwitchTo(MessageContext);
 
@@ -1942,6 +1967,11 @@ exec_execute_message(const char *portal_name, long max_rows)
 	start_xact_command();
 
 	/*
+	 * Set statement timeout running if it's not already set.
+	 */
+	enable_statement_timeout();
+
+	/*
 	 * If we re-issue an Execute protocol request against an existing portal,
 	 * then we are only fetching more rows rather than completely re-executing
 	 * the query from the start. atStart is never reset for a v3 portal, so we
@@ -2014,6 +2044,11 @@ exec_execute_message(const char *portal_name, long max_rows)
 			 * those that start or end a transaction block.
 			 */
 			CommandCounterIncrement();
+
+			/*
+			 * We need to reset statement timeout if already set.
+			 */
+			disable_statement_timeout();
 		}
 
 		/* Send appropriate CommandComplete to client */
@@ -2443,14 +2478,10 @@ start_xact_command(void)
 	{
 		StartTransactionCommand();
 
-		/* Set statement timeout running, if any */
-		/* NB: this mustn't be enabled until we are within an xact */
-		if (StatementTimeout > 0)
-			enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
-		else
-			disable_timeout(STATEMENT_TIMEOUT, false);
-
 		xact_started = true;
+
+		/* Set statement timeout running, if any */
+		enable_statement_timeout();
 	}
 }
 
@@ -2460,7 +2491,7 @@ finish_xact_command(void)
 	if (xact_started)
 	{
 		/* Cancel any active statement timeout before committing */
-		disable_timeout(STATEMENT_TIMEOUT, false);
+		disable_statement_timeout();
 
 		CommitTransactionCommand();
 
@@ -4511,3 +4542,53 @@ log_disconnections(int code, Datum arg)
 	port->user_name, port->database_name, port->remote_host,
   port->remote_port[0] ? " port=" : "", port->remote_port)));
 }
+
+/*
+ * Set statement timeout if any.
+ */
+static void
+enable_statement_timeout(void)
+{
+	if (!stmt_timer_started)
+	{
+		if (StatementTimeout > 0)
+		{
+
+			/*
+			 * Sanity check
+			 */
+			if (!xact_started)
+			{
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("Transaction must have been already started to set statement timeout")));
+			}
+
+			ereport(DEBUG3,
+	(errmsg_internal("Set statement timeout")));
+
+			enable_timeout_after(STATEMENT_TIMEOUT, St

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Joe Conway
On 04/04/2017 10:02 AM, Joe Conway wrote:
> On 04/04/2017 09:55 AM, Mike Palmiotto wrote:
>> After some discussion off-list, I've rebased and udpated the patches.
>> Please see attached for further review.
> 
> Thanks -- will have another look and test on a machine with selinux
> setup. Robert, did you want me to take responsibility to commit on this
> or just provide review/feedback?

I did some editorializing on these.

In particular I did not like the approach to fixing "warning: ‘tclass’
may be used uninitialized" and ended up just doing it the same as was
done elsewhere in relation.c already (set tclass = 0 in the variable
declaration). Along the way I also changed one instance of tclass from
uint16 to uint16_t for the sake of consistency.

Interestingly we figured out that the warning was present with -Og, but
not present with -O0, -O2, or -O3.

If you want to test, apply 0001a and 0001b before 0002.

Any objections?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 1a8f884..320c098 100644
*** a/contrib/sepgsql/label.c
--- b/contrib/sepgsql/label.c
***
*** 10,15 
--- 10,25 
   */
  #include "postgres.h"
  
+ #include 
+ /*
+  * selinux/label.h includes stdbool.h, which redefines bool, so
+  * revert to the postgres definition of bool from c.h
+  */
+ #ifdef bool
+ #undef bool
+ typedef char bool;
+ #endif
+ 
  #include "access/heapam.h"
  #include "access/htup_details.h"
  #include "access/genam.h"
***
*** 37,44 
  
  #include "sepgsql.h"
  
- #include 
- 
  /*
   * Saved hook entries (if stacked)
   */
--- 47,52 
diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c
index ab98a9b..2ea6bfb 100644
*** a/contrib/sepgsql/relation.c
--- b/contrib/sepgsql/relation.c
*** sepgsql_relation_post_create(Oid relOid)
*** 243,249 
  	HeapTuple	tuple;
  	Form_pg_class classForm;
  	ObjectAddress object;
! 	uint16		tclass;
  	char	   *scontext;		/* subject */
  	char	   *tcontext;		/* schema */
  	char	   *rcontext;		/* relation */
--- 243,249 
  	HeapTuple	tuple;
  	Form_pg_class classForm;
  	ObjectAddress object;
! 	uint16_t	tclass;
  	char	   *scontext;		/* subject */
  	char	   *tcontext;		/* schema */
  	char	   *rcontext;		/* relation */
*** sepgsql_relation_drop(Oid relOid)
*** 413,419 
  {
  	ObjectAddress object;
  	char	   *audit_name;
! 	uint16_t	tclass;
  	char		relkind;
  
  	relkind = get_rel_relkind(relOid);
--- 413,419 
  {
  	ObjectAddress object;
  	char	   *audit_name;
! 	uint16_t	tclass = 0;
  	char		relkind;
  
  	relkind = get_rel_relkind(relOid);
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 1a8f884..4dda82a 100644
*** a/contrib/sepgsql/label.c
--- b/contrib/sepgsql/label.c
*** exec_object_restorecon(struct selabel_ha
*** 779,785 
  			case RelationRelationId:
  relForm = (Form_pg_class) GETSTRUCT(tuple);
  
! if (relForm->relkind == RELKIND_RELATION)
  	objtype = SELABEL_DB_TABLE;
  else if (relForm->relkind == RELKIND_SEQUENCE)
  	objtype = SELABEL_DB_SEQUENCE;
--- 779,786 
  			case RelationRelationId:
  relForm = (Form_pg_class) GETSTRUCT(tuple);
  
! if (relForm->relkind == RELKIND_RELATION ||
! 	relForm->relkind == RELKIND_PARTITIONED_TABLE)
  	objtype = SELABEL_DB_TABLE;
  else if (relForm->relkind == RELKIND_SEQUENCE)
  	objtype = SELABEL_DB_SEQUENCE;
diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c
index ab98a9b..f8689c0 100644
*** a/contrib/sepgsql/relation.c
--- b/contrib/sepgsql/relation.c
*** sepgsql_attribute_post_create(Oid relOid
*** 54,65 
  	ObjectAddress object;
  	Form_pg_attribute attForm;
  	StringInfoData audit_name;
  
  	/*
! 	 * Only attributes within regular relation have individual security
! 	 * labels.
  	 */
! 	if (get_rel_relkind(relOid) != RELKIND_RELATION)
  		return;
  
  	/*
--- 54,66 
  	ObjectAddress object;
  	Form_pg_attribute attForm;
  	StringInfoData audit_name;
+ 	char		relkind = get_rel_relkind(relOid);
  
  	/*
! 	 * Only attributes within regular relations or partition relations have
! 	 * individual security labels.
  	 */
! 	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
  		return;
  
  	/*
*** sepgsql_attribute_drop(Oid relOid, AttrN
*** 135,142 
  {
  	ObjectAddress object;
  	char	   *audit_name;
  
! 	if (get_rel_relkind(relOid) != RELKIND_RELATION)
  		return;
  
  	/*
--- 136,144 
  {
  	ObjectAddress object;
  	char	   *audit_name;
+ 	char		relkind = get_rel_relkind(relOid);
  
! 	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
  		return;
  
  	/*
*** sepgsql_attribute_relabel(Oid relOid, At
*** 167,174 
  {
  	ObjectAddress object;
  	char	   *audit_name;
  
! 	if (get_rel_rel

Re: [HACKERS] increasing the default WAL segment size

2017-04-04 Thread Simon Riggs
On 27 March 2017 at 15:36, Beena Emerson  wrote:

> 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
> the internal representation of max_wal_size and min_wal_size to mb.

Committed first part to allow internal representation change (only).

No commitment yet to increasing wal-segsize in the way this patch has it.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

2017-04-04 Thread Heikki Linnakangas

On 04/04/2017 01:52 PM, Heikki Linnakangas wrote:

On 03/31/2017 10:10 AM, Michael Paquier wrote:

On Wed, Mar 8, 2017 at 10:39 PM, Robert Haas  wrote:

On Tue, Mar 7, 2017 at 10:01 PM, Michael Paquier
 wrote:
I kinda hope Heikki is going to step up to the plate here, because I
think he understands most of it already.


The second person who knows a good deal about NFKC is Ishii-san.

Attached is a rebased patch.


Thanks, I'm looking at this now.


I will continue tomorrow, but I wanted to report on what I've done so 
far. Attached is a new patch version, quite heavily modified. Notable 
changes so far:


* Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit 
ints. IMHO this makes the tables easier to read (to a human), and they 
are also packed slightly more tightly (see next two points), as you can 
fit more codepoints in a 16-bit integer.


* Reorganize the decomposition table. Instead of a separate 
binary-searched table for each "size", have one table that's ordered by 
codepoint, which contains a length and offset into another array, where 
all the decomposed sequences are. This makes the recomposition step 
somewhat slower, as it now has to grovel through an array that contains 
all decompositions, rather than just the array that contains 
decompositions of two characters. But this isn't performance-critical, 
readability and tight packing of the tables matter more.


* In the lookup table, store singleton decompositions that decompose to 
a single character, and the character fits in a uint16, directly in the 
main lookup table, instead of storing an index into the other table. 
This makes the tables somewhat smaller, as there are a lot of such 
decompositions.


* Use uint32 instead of pg_wchar to represent unicode codepoints. 
pg_wchar suggests something that holds multi-byte characters in the OS 
locale, used by functions like wcscmp, so avoid that. I added a "typedef 
uint32 Codepoint" alias, though, to make it more clear which variables 
hold Unicode codepoints rather than e.g. indexes into arrays.


* Unicode consortium publishes a comprehensive normalization test suite, 
as a text file, NormalizationTest.txt. I wrote a small perl and C 
program to run all the tests from that test suite, with the 
normalization function. That uncovered a number of bugs in the 
recomposition algorithm, which I then fixed:


 * decompositions that map to ascii characters cannot be excluded.
 * non-canonical and non-starter decompositions need to be excluded. 
They are now flagged in the lookup table, so that we only use them 
during the decomposition phase, not when recomposing.



TODOs / Discussion points:

* I'm not sure I like the way the code is organized, I feel that we're 
littering src/common with these. Perhaps we should add a 
src/common/unicode_normalization directory for all this.


* The list of characters excluded from recomposition is currently 
hard-coded in utf_norm_generate.pl. However, that list is available in 
machine-readable format, in file CompositionExclusions.txt. Since we're 
reading most of the data from UnicodeData.txt, would be good to read the 
exclusion table from a file, too.


* SASLPrep specifies normalization form KC, but it also specifies that 
some characters are mapped to space or nothing. Should do those 
mappings, too.


- Heikki


implement-SASLprep-2.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED

2017-04-04 Thread Andres Freund
On 2017-04-04 23:23:30 +0200, Tomas Vondra wrote:
> On 04/04/2017 10:42 PM, Tomas Vondra wrote:
> > Hi,
> > 
> > Andres nagged to me about valgrind runs taking much longer since
> > 9fab40ad introduced the SlabContext into reorderbuffer.c. And by
> > "longer" I mean hours instead of minutes.
> > 
> > After a bit of investigation I stumbled on this line in slab.c:
> > 
> >   VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk));
> > 
> > which is wrong, because the second parameter should be size and not
> > another pointer. This essentially marks a lot of memory as defined,
> > which results in the extreme test runtime.
> > 
> > Instead, the line should be:
> > 
> >   VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32));
> > 
> > Patch attached.
> > 
> 
> Turns out SlabCheck() was missing VALGRIND_MAKE_MEM_DEFINED, resulting in a
> valgrind failure with --enable-assert. Updated patch version attached,
> passing all tests in test_decoding.

Pushed, and re-enabled TestDecoding on skink.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED

2017-04-04 Thread Tomas Vondra

On 04/04/2017 10:42 PM, Tomas Vondra wrote:

Hi,

Andres nagged to me about valgrind runs taking much longer since
9fab40ad introduced the SlabContext into reorderbuffer.c. And by
"longer" I mean hours instead of minutes.

After a bit of investigation I stumbled on this line in slab.c:

  VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk));

which is wrong, because the second parameter should be size and not
another pointer. This essentially marks a lot of memory as defined,
which results in the extreme test runtime.

Instead, the line should be:

  VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32));

Patch attached.



Turns out SlabCheck() was missing VALGRIND_MAKE_MEM_DEFINED, resulting 
in a valgrind failure with --enable-assert. Updated patch version 
attached, passing all tests in test_decoding.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-valgrind-fix-v2.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-04-04 Thread Keith Fiske
On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed  wrote:

> Hello,
>
> Please find attached an updated patch.
> Following has been accomplished in this update:
>
> 1. A new partition can be added after default partition if there are no
> conflicting rows in default partition.
> 2. Solved the crash reported earlier.
>
> Thank you,
> Rahila Syed
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
Installed and compiled against commit
60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017
-0400) without any issue

However, running your original example, I'm getting this error

keith@keith=# CREATE TABLE list_partitioned (
keith(# a int
keith(# ) PARTITION BY LIST (a);
CREATE TABLE
Time: 4.933 ms
keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR
VALUES IN (DEFAULT);
CREATE TABLE
Time: 3.492 ms
keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES
IN (4,5);
ERROR:  unrecognized node type: 216
Time: 0.979 ms

Also, I'm still of the opinion that denying future partitions of values in
the default would be a cause of confusion. In order to move the data out of
the default and into a proper child it would require first removing that
data from the default, storing it elsewhere, creating the child, then
moving it back. If it's only a small amount of data it may not be a big
deal, but if it's a large amount, that could cause quite a lot of
contention if done in a single transaction. Either that or the user would
have to deal with data existing in the table, disappearing, then
reappearing again.

This also makes it harder to migrate an existing table easily. Essentially
no child tables for a large, existing data set could ever be created
without going through one of the two situations above.

However, thinking through this, I'm not sure I can see a solution without
the global index support. If this restriction is removed, there's still an
issue of data duplication after the necessary child table is added. So
guess it's a matter of deciding which user experience is better for the
moment?

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com


Re: [HACKERS] PATCH: recursive json_populate_record()

2017-04-04 Thread Andrew Dunstan


On 04/03/2017 05:17 PM, Andres Freund wrote:
> On 2017-03-21 14:31:08 -0400, Andrew Dunstan wrote:
>>
>> On 03/21/2017 01:37 PM, David Steele wrote:
>>> On 3/16/17 11:54 AM, David Steele wrote:
 On 2/1/17 12:53 AM, Michael Paquier wrote:
> On Thu, Jan 26, 2017 at 6:49 AM, Tom Lane  wrote:
>> Nikita Glukhov  writes:
>>> On 25.01.2017 23:58, Tom Lane wrote:
 I think you need to take a second look at the code you're producing
 and realize that it's not so clean either.  This extract from
 populate_record_field, for example, is pretty horrid:
>>> But what if we introduce some helper macros like this:
>>> #define JsValueIsNull(jsv) \
>>>  ((jsv)->is_json ? !(jsv)->val.json.str \
>>>  : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull)
>>> #define JsValueIsString(jsv) \
>>>  ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \
>>>  : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString)
>> Yeah, I was wondering about that too.  I'm not sure that you can make
>> a reasonable set of helper macros that will fix this, but if you want
>> to try, go for it.
>>
>> BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have
>> to go back to the manual every darn time to convince myself whether
>> that means (a?b:c)||d or a?b:(c||d).  It's better not to rely on
>> the reader (... or the author) having memorized C's precedence rules
>> in quite that much detail.  Extra parens help.
> Moved to CF 2017-03 as discussion is going on and more review is
> needed on the last set of patches.
>
 The current patches do not apply cleanly at cccbdde:

 $ git apply ../other/0001-introduce-JsonContainerXxx-macros-v04.patch
 error: patch failed: src/backend/utils/adt/jsonb_util.c:328
 error: src/backend/utils/adt/jsonb_util.c: patch does not apply
 error: patch failed: src/backend/utils/adt/jsonfuncs.c:1266
 error: src/backend/utils/adt/jsonfuncs.c: patch does not apply
 error: patch failed: src/include/utils/jsonb.h:218
 error: src/include/utils/jsonb.h: patch does not apply

 In addition, it appears a number of suggestions have been made by Tom
 that warrant new patches.  Marked "Waiting on Author".
>>> This thread has been idle for months since Tom's review.
>>>
>>> The submission has been marked "Returned with Feedback".  Please feel
>>> free to resubmit to a future commitfest.
>>>
>>>
>> Please revive. I am planning to look at this later this week.
> Since that was 10 days ago - you're still planning to get this in before
> the freeze?
>


Hoping to, other demands have intervened a bit, but I might be able to.

cheers

andrew

-- 

Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-04-04 Thread Pavel Stehule
2017-04-04 22:05 GMT+02:00 Fabien COELHO :

>
> After some discussions about what could be useful since psql scripts now
> accepts tests, this patch sets a few variables which can be used by psql
> after a "front door" (i.e. actually typed by the user) query:
>
>  - RESULT_STATUS: the status of the query
>  - ERROR: whether the query failed
>  - ERROR_MESSAGE: ...
>  - ROW_COUNT: #rows affected
>
>  SELECT * FROM ;
>  \if :ERROR
>\echo oops
>\q
>  \endif
>
> I'm not sure that the names are right. Maybe STATUS would be better than
> RESULT_STATUS.


good ideas

Regards

Pavel

>
>
> --
> Fabien.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-04 Thread Peter Geoghegan
On Tue, Apr 4, 2017 at 1:32 PM, Andres Freund  wrote:
> s/Avoid/Allow to avoid/

WFM.

>> + *
>> + * Callers cannot rely on memory for tuple in returned slot remaining valid
>> + * past any subsequent manipulation of the sorter, such as another fetch of
>> + * input from sorter.  (The sorter may recycle memory.)
>>   */
>
> It's not just the sorter, right? I'd just rephrase that the caller
> cannot rely on tuple to stay valid beyond a single call.

It started that way, but changed on the basis of Tom's feedback. I
would be equally happy with either wording.

>>  static bool
>>  tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
>> @@ -2091,12 +2092,15 @@ tuplesort_gettuple_common(Tuplesortstate *state, 
>> bool forward,
>>   * NULL value in leading attribute will set abbreviated value to zeroed
>>   * representation, which caller may rely on in abbreviated inequality check.
>>   *
>> - * The slot receives a copied tuple (sometimes allocated in caller memory
>> - * context) that will stay valid regardless of future manipulations of the
>> - * tuplesort's state.
>> + * If copy is TRUE, the slot receives a copied tuple that will stay valid
>> + * regardless of future manipulations of the tuplesort's state.  Memory is
>> + * owned by the caller.  If copy is FALSE, the slot will just receive a
>> + * pointer to a tuple held within the tuplesort, which is more efficient,
>> + * but only safe for callers that are prepared to have any subsequent
>> + * manipulation of the tuplesort's state invalidate slot contents.
>>   */
>
> Hm. Do we need to note anything about how long caller memory needs to
> live? I think we'd get into trouble if the caller does a memory context
> reset, without also clearing the slot?

Since we arrange to have caller explicitly own memory (it's always in
their own memory context (current context), which is not the case with
other similar routines), it's 100% the caller's problem. As I assume
you know, convention in executor around resource management of memory
like this is pretty centralized within ExecStoreTuple(), and nearby
routines. In general, the executor is more or less used to having this
be its problem alone, and is pessimistic about memory lifetime unless
otherwise noted.

> Other than these minimal adjustments, this looks good to go to me.

Cool.

I'll try to get out a revision soon, maybe later today, including an
updated 0002-* (Valgrind suppression), which I have not forgotten
about.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED

2017-04-04 Thread Tomas Vondra

Hi,

Andres nagged to me about valgrind runs taking much longer since 
9fab40ad introduced the SlabContext into reorderbuffer.c. And by 
"longer" I mean hours instead of minutes.


After a bit of investigation I stumbled on this line in slab.c:

  VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk));

which is wrong, because the second parameter should be size and not 
another pointer. This essentially marks a lot of memory as defined, 
which results in the extreme test runtime.


Instead, the line should be:

  VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32));

Patch attached.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-valgrind-fix.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument

2017-04-04 Thread Andres Freund
On 2017-03-13 18:14:07 -0700, Peter Geoghegan wrote:
> From 5351b5db257cb39832647d9117465c0217e6268b Mon Sep 17 00:00:00 2001
> From: Peter Geoghegan 
> Date: Thu, 13 Oct 2016 10:54:31 -0700
> Subject: [PATCH 1/2] Avoid copying within tuplesort_gettupleslot().

s/Avoid/Allow to avoid/

> Add a "copy" argument to make it optional to receive a copy of caller
> tuple that is safe to use following a subsequent manipulating of
> tuplesort's state.  This is a performance optimization.  Most existing
> tuplesort_gettupleslot() callers are made to opt out of copying.
> Existing callers that happen to rely on the validity of tuple memory
> beyond subsequent manipulations of the tuplesort request their own copy.
> 
> This brings tuplesort_gettupleslot() in line with
> tuplestore_gettupleslot().  In the future, a "copy" tuplesort_getdatum()
> argument may be added, that similarly allows callers to opt out of
> receiving their own copy of tuple.
> 
> In passing, clarify assumptions that callers of other tuplesort fetch
> routines may make about tuple memory validity, per gripe from Tom Lane.
> ---
>  src/backend/executor/nodeAgg.c | 10 +++---
>  src/backend/executor/nodeSort.c|  5 +++--
>  src/backend/utils/adt/orderedsetaggs.c |  5 +++--
>  src/backend/utils/sort/tuplesort.c | 28 +---
>  src/include/utils/tuplesort.h  |  2 +-
>  5 files changed, 31 insertions(+), 19 deletions(-)
> 
> diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
> index aa08152..b650f71 100644
> --- a/src/backend/executor/nodeAgg.c
> +++ b/src/backend/executor/nodeAgg.c
> @@ -570,6 +570,10 @@ initialize_phase(AggState *aggstate, int newphase)
>   * Fetch a tuple from either the outer plan (for phase 0) or from the sorter
>   * populated by the previous phase.  Copy it to the sorter for the next phase
>   * if any.
> + *
> + * Callers cannot rely on memory for tuple in returned slot remaining valid
> + * past any subsequent manipulation of the sorter, such as another fetch of
> + * input from sorter.  (The sorter may recycle memory.)
>   */

It's not just the sorter, right? I'd just rephrase that the caller
cannot rely on tuple to stay valid beyond a single call.



>  static bool
>  tuplesort_gettuple_common(Tuplesortstate *state, bool forward,
> @@ -2091,12 +2092,15 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool 
> forward,
>   * NULL value in leading attribute will set abbreviated value to zeroed
>   * representation, which caller may rely on in abbreviated inequality check.
>   *
> - * The slot receives a copied tuple (sometimes allocated in caller memory
> - * context) that will stay valid regardless of future manipulations of the
> - * tuplesort's state.
> + * If copy is TRUE, the slot receives a copied tuple that will stay valid
> + * regardless of future manipulations of the tuplesort's state.  Memory is
> + * owned by the caller.  If copy is FALSE, the slot will just receive a
> + * pointer to a tuple held within the tuplesort, which is more efficient,
> + * but only safe for callers that are prepared to have any subsequent
> + * manipulation of the tuplesort's state invalidate slot contents.
>   */

Hm. Do we need to note anything about how long caller memory needs to
live? I think we'd get into trouble if the caller does a memory context
reset, without also clearing the slot?


>  bool
> -tuplesort_gettupleslot(Tuplesortstate *state, bool forward,
> +tuplesort_gettupleslot(Tuplesortstate *state, bool forward, bool copy,
>  TupleTableSlot *slot, Datum *abbrev)
>  {
>   MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext);
> @@ -2113,8 +2117,10 @@ tuplesort_gettupleslot(Tuplesortstate *state, bool 
> forward,
>   if (state->sortKeys->abbrev_converter && abbrev)
>   *abbrev = stup.datum1;
>  
> - stup.tuple = heap_copy_minimal_tuple((MinimalTuple) stup.tuple);
> - ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, true);
> + if (copy)
> + stup.tuple = heap_copy_minimal_tuple((MinimalTuple) 
> stup.tuple);
> +
> + ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, copy);
>   return true;


Other than these minimal adjustments, this looks good to go to me.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical decoding on standby

2017-04-04 Thread Andres Freund
On 2017-04-04 22:32:40 +0800, Craig Ringer wrote:
> I'm much happier with this. I'm still fixing some issues in the tests
> for 03 and tidying them up, but 03 should allow 01 and 02 to be
> reviewed in their proper context now.

To me this very clearly is too late for v10, and now should be moved to
the next CF.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-04-04 Thread Andres Freund
On 2017-04-04 08:01:32 -0400, Robert Haas wrote:
> On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund  wrote:
> > I don't think the parallel seqscan is comparable in complexity with the
> > parallel append case.  Each worker there does the same kind of work, and
> > if one of them is behind, it'll just do less.  But correct sizing will
> > be more important with parallel-append, because with non-partial
> > subplans the work is absolutely *not* uniform.
>
> Sure, that's a problem, but I think it's still absolutely necessary to
> ramp up the maximum "effort" (in terms of number of workers)
> logarithmically.  If you just do it by costing, the winning number of
> workers will always be the largest number that we think we'll be able
> to put to use - e.g. with 100 branches of relatively equal cost we'll
> pick 100 workers.  That's not remotely sane.

I'm quite unconvinced that just throwing a log() in there is the best
way to combat that.  Modeling the issue of starting more workers through
tuple transfer, locking, startup overhead costing seems a better to me.

If the goal is to compute the results of the query as fast as possible,
and to not use more than max_parallel_per_XXX, and it's actually
beneficial to use more workers, then we should.  Because otherwise you
really can't use the resources available.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql - add special variable to reflect the last query status

2017-04-04 Thread Fabien COELHO


After some discussions about what could be useful since psql scripts now 
accepts tests, this patch sets a few variables which can be used by psql 
after a "front door" (i.e. actually typed by the user) query:


 - RESULT_STATUS: the status of the query
 - ERROR: whether the query failed
 - ERROR_MESSAGE: ...
 - ROW_COUNT: #rows affected

 SELECT * FROM ;
 \if :ERROR
   \echo oops
   \q
 \endif

I'm not sure that the names are right. Maybe STATUS would be better than 
RESULT_STATUS.


--
Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 3b86612..7006f23 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3449,6 +3449,24 @@ bar
   
 
   
+   ERROR
+   
+
+ Whether the last query failed.
+
+   
+  
+
+  
+   ERROR_MESSAGE
+   
+
+ If the last query failed, the associated error message.
+
+   
+  
+
+  
 FETCH_COUNT
 
 
@@ -3653,6 +3671,25 @@ bar
   
 
   
+   RESULT_STATUS
+   
+
+ Text status of the last query.
+
+   
+  
+
+  
+   ROW_COUNT
+   
+
+ When appropriate, how many rows were returned or affected by the
+ last query.
+
+   
+  
+
+  
 QUIET
 
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index a2f1259..74d22fb 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1213,6 +1213,57 @@ PrintQueryResults(PGresult *results)
 	return success;
 }
 
+/*
+ * Set special variables for "front door" queries
+ * - RESULT_STATUS: last query status
+ * - ERROR: TRUE/FALSE, whether an error occurred
+ * - ERROR_MESSAGE: message if an error occured
+ * - ROW_COUNT: how many rows were returned or affected, if appropriate
+ */
+static void
+SetResultVariables(PGresult *results)
+{
+	bool			success;
+	ExecStatusType	restat = PQresultStatus(results);
+
+	SetVariable(pset.vars, "RESULT_STATUS", PQresStatus(restat));
+
+	switch (restat)
+	{
+		case PGRES_EMPTY_QUERY:
+		case PGRES_TUPLES_OK:
+		case PGRES_COMMAND_OK:
+		case PGRES_COPY_OUT:
+		case PGRES_COPY_IN:
+		case PGRES_COPY_BOTH:
+		case PGRES_SINGLE_TUPLE:
+			success = true;
+			SetVariable(pset.vars, "ERROR", "FALSE");
+			SetVariable(pset.vars, "ERROR_MESSAGE", NULL);
+			break;
+		case PGRES_BAD_RESPONSE:
+		case PGRES_NONFATAL_ERROR:
+		case PGRES_FATAL_ERROR:
+			success = false;
+			SetVariable(pset.vars, "ERROR", "TRUE");
+			SetVariable(pset.vars, "ERROR_MESSAGE", PQerrorMessage(pset.db));
+			break;
+		default:
+			/* dead code */
+			success = false;
+			psql_error("unexpected PQresultStatus: %d\n", restat);
+			break;
+	}
+
+	if (success)
+	{
+		/* set variable if non empty */
+		char   *ntuples = PQcmdTuples(results);
+		SetVariable(pset.vars, "ROW_COUNT", *ntuples ? ntuples : NULL);
+	}
+	else
+		SetVariable(pset.vars, "ROW_COUNT", NULL);
+}
 
 /*
  * SendQuery: send the query string to the backend
@@ -1346,6 +1397,9 @@ SendQuery(const char *query)
 			elapsed_msec = INSTR_TIME_GET_MILLISEC(after);
 		}
 
+		/* set special variables to reflect the result status */
+		SetResultVariables(results);
+
 		/* but printing results isn't: */
 		if (OK && results)
 			OK = PrintQueryResults(results);
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index d602aee..3ee96b5 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -2904,6 +2904,36 @@ bar 'bar' "bar"
 	\echo 'should print #8-1'
 should print #8-1
 \endif
+-- special result variables
+SELECT 1 UNION SELECT 2;
+ ?column? 
+--
+1
+2
+(2 rows)
+
+\echo 'result status: ' :RESULT_STATUS
+result status:  PGRES_TUPLES_OK
+\echo 'number of rows: ' :ROW_COUNT
+number of rows:  2
+SELECT 1 UNION;
+ERROR:  syntax error at or near ";"
+LINE 1: SELECT 1 UNION;
+  ^
+\echo 'result status: ' :RESULT_STATUS
+result status:  PGRES_FATAL_ERROR
+\if :ERROR
+  \echo 'Oops, an error occured...'
+Oops, an error occured...
+  \echo 'error message: ' :ERROR_MESSAGE
+error message:  ERROR:  syntax error at or near ";"
+LINE 1: SELECT 1 UNION;
+  ^
+
+\endif
+;
+\echo 'result status: ' :RESULT_STATUS
+result status:  PGRES_EMPTY_QUERY
 -- SHOW_CONTEXT
 \set SHOW_CONTEXT never
 do $$
diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql
index b56a05f..716fe06 100644
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -526,6 +526,21 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two;
 	\echo 'should print #8-1'
 \endif
 
+-- special result variables
+SELECT 1 UNION SELECT 2;
+\echo 'result status: ' :RESULT_STATUS
+\echo 'number of rows: ' :ROW_COUNT
+
+SELECT 1 UNION;
+\echo 'result status: ' :RESULT_STATUS
+\if :ERROR
+  \echo 'Oops, an error occured...'
+  \echo 'error message: ' :ERROR_ME

Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread David Steele
On 4/4/17 12:55 PM, Ashutosh Sharma wrote:
> 
> As I am not seeing any response from Tomas for last 2-3 days and since
> the commit-fest is coming towards end, I have planned to work on the
> review comments that I had given few days back and submit the updated
> patch. PFA new version of patch that takes care of all the review
> comments given by me. I have also ran pgindent on btreefuncs.c file
> and have run some basic test cases. All looks fine to me now!

Awesome, Ashutosh!

-- 
-David
da...@pgmasters.net


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-04-04 Thread Peter Geoghegan
On Tue, Apr 4, 2017 at 3:07 AM, Anastasia Lubennikova
 wrote:
>> * I think that we should store this (the number of attributes), and
>> use it directly when comparing, per my remarks to Tom over on that
>> other thread. We should also use the free bit within
>> IndexTupleData.t_info, to indicate that the IndexTuple was truncated,
>> just to make it clear to everyone that might care that that's how
>> these truncated IndexTuples need to be represented.
>>
>> Doing this would have no real impact on your patch, because for you
>> this will be 100% redundant. It will help external tools, and perhaps
>> another, more general suffix truncation patch that comes in the
>> future. We should try very hard to have a future-proof on-disk
>> representation. I think that this is quite possible.
>
> To be honest, I think that it'll make the patch overcomplified, because this
> exact patch has nothing to do with suffix truncation.
> Although, we can add any necessary flags if this work will be continued in
> the future.

Yes, doing things that way would mean adding a bit more complexity to
your patch, but IMV would be worth it to have the on-disk format be
compatible with what a full suffix truncation patch will eventually
require.

Obviously I disagree with what you say here -- I think that your patch
*does* have plenty in common with suffix truncation. But, you don't
have to even agree with me on that to see why what I propose is still
a good idea. Tom Lane had a specific objection to this patch --
catalog metadata is currently necessary to interpret internal page
IndexTuples [1]. However, by storing the true number of columns in the
case of truncated tuples, we can make the situation with IndexTuples
similar enough to the existing situation with heap tuples, where the
number of attributes is available right in the header as "natts". We
don't have to rely on something like catalog metadata from a great
distance, where some caller may forget to pass through the metadata to
a lower level.

So, presumably doing things this way addresses Tom's exact objection
to the truncation aspect of this patch [2]. We have the capacity to
store something like natts "for free" -- let's use it. The lack of any
direct source of metadata was called "dangerous". As much as anything
else, I want to remove any danger.

> There is already an assertion.
> Assert(IndexTupleSize(newitup) <= IndexTupleSize(olditup));
> Do you think it is not enough?

I think that a "can't happen" check will work better in the future,
when user defined code could be involved in truncation. Any extra
overhead will be paid relatively infrequently, and will be very low.

>> * I see a small bug. You forgot to teach _bt_findsplitloc() about
>> truncation. It does this currently, which you did not update:
>>
>>  /*
>>   * The first item on the right page becomes the high key of the left
>> page;
>>   * therefore it counts against left space as well as right space.
>>   */
>>  leftfree -= firstrightitemsz;
>>
>> I think that this accounting needs to be fixed.
>
> Could you explain, what's wrong with this accounting? We may expect to take
> more space on the left page, than will be taken after highkey truncation.
> But I don't see any problem here.

Obviously it would at least be slightly better to have the actual
truncated high key size where that's expected -- not the would-be
untruncated high key size. The code as it stands might lead to a bad
choice of split point in edge-cases.

At the very least, you should change comments to note the issue. I
think it's highly unlikely that this could ever result in a failure to
find a split point, which there are many defenses against already, but
I think I would find that difficult to prove. The intent of the code
is almost as important as the code, at least in my opinion.

[1] 
postgr.es/m/CAH2-Wz=vmdh8pfazx9wah9bn5ast5vrna0xsz+gsfrs12bp...@mail.gmail.com
[2] postgr.es/m/11895.1490983884%40sss.pgh.pa.us
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread Tomas Vondra

Thanks. I planned to look into this today, but you've been faster ;-)

regards
Tomas

On 04/04/2017 06:55 PM, Ashutosh Sharma wrote:

Hi,

As I am not seeing any response from Tomas for last 2-3 days and since
the commit-fest is coming towards end, I have planned to work on the
review comments that I had given few days back and submit the updated
patch. PFA new version of patch that takes care of all the review
comments given by me. I have also ran pgindent on btreefuncs.c file
and have run some basic test cases. All looks fine to me now!

Please note that this patch still belongs to Tomas not me. I still
remain the reviewer of this patch. The same thing has been very
clearly mentioned in the attached patch. The only intention behind
Ashutosh (reviewer) working on this patch is to ensure that we do not
miss the things that can easily get committed in this commit-fest.
Thanks.



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics (v25)

2017-04-04 Thread Tomas Vondra

On 04/04/2017 09:55 AM, David Rowley wrote:

On 1 April 2017 at 04:25, David Rowley  wrote:

I've attached an updated patch.


I've made another pass at this and ended up removing the tryextstats
variable. We now only try to use extended statistics when
clauselist_selectivity() is given a valid RelOptInfo with rtekind ==
RTE_RELATION, and of course, it must also have some extended stats
defined too.

I've also cleaned up a few more comments, many of which I managed to
omit updating when I refactored how the selectivity estimates ties
into clauselist_selectivity()

I'm quite happy with all of this now, and would also be happy for
other people to take a look and comment.

As a reviewer, I'd be marking this ready for committer, but I've moved
a little way from just reviewing this now, having spent two weeks
hacking at it.

The latest patch is attached.



Thanks David, I agree the reworked patch is much cleaner that the last 
version I posted. Thanks for spending your time on it.


Two minor comments:

1) DEPENDENCY_MIN_GROUP_SIZE

I'm not sure we still need the min_group_size, when evaluating 
dependencies. It was meant to deal with 'noisy' data, but I think it 
after switching to the 'degree' it might actually be a bad idea.


Consider this:

create table t (a int, b int);
insert into t select 1, 1 from generate_series(1, 1) s(i);
insert into t select i, i from generate_series(2, 2) s(i);
create statistics s with (dependencies) on (a,b) from t;
analyze t;

select stadependencies from pg_statistic_ext ;
  stadependencies

 [{1 => 2 : 0.44}, {2 => 1 : 0.44}]
(1 row)

So the degree of the dependency is just ~0.333 although it's obviously a 
perfect dependency, i.e. a knowledge of 'a' determines 'b'. The reason 
is that we discard 2/3 of rows, because those groups are only a single 
row each, except for the one large group (1/3 of rows).


Without the mininum group size limitation, the dependencies are:

test=# select stadependencies from pg_statistic_ext ;
  stadependencies

 [{1 => 2 : 1.00}, {2 => 1 : 1.00}]
(1 row)

which seems way more reasonable, I think.


2) A minor detail is that instead of this

if (estimatedclauses != NULL &&
bms_is_member(listidx, estimatedclauses))
continue;

perhaps we should do just this:

if (bms_is_member(listidx, estimatedclauses))
continue;

bms_is_member does the same NULL check right at the beginning, so I 
don't think this might make a measurable difference.



kind regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-04-04 Thread Petr Jelinek
So this is what I came up with on plane. Generalized the loading
functionality into load_library_function which that can load either
known postgres functions or call load_external_function.

I am not quite sure if fmgr.c is best place to put it, but I didn't want
to include stuff from executor in bgworker.c. I was thinking about
putting it to dfmgr.c (as that's where load_external_function) but again
seemed weird to including executor there.

As with previous patch, 9.6 will need quite different patch as we need
to keep compatibility there, but since I am traveling I think it's
better to share what I have so far.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/access/transam/README.parallel b/src/backend/access/transam/README.parallel
index db9ac3d..b360887 100644
--- a/src/backend/access/transam/README.parallel
+++ b/src/backend/access/transam/README.parallel
@@ -198,7 +198,7 @@ pattern looks like this:
 
 	EnterParallelMode();		/* prohibit unsafe state changes */
 
-	pcxt = CreateParallelContext(entrypoint, nworkers);
+	pcxt = CreateParallelContext("library_name", "function_name", nworkers);
 
 	/* Allow space for application-specific data here. */
 	shm_toc_estimate_chunk(&pcxt->estimator, size);
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b3d3853..326d4f9 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -61,7 +61,7 @@
 #define PARALLEL_KEY_TRANSACTION_SNAPSHOT	UINT64CONST(0x0006)
 #define PARALLEL_KEY_ACTIVE_SNAPSHOT		UINT64CONST(0x0007)
 #define PARALLEL_KEY_TRANSACTION_STATE		UINT64CONST(0x0008)
-#define PARALLEL_KEY_EXTENSION_TRAMPOLINE	UINT64CONST(0x0009)
+#define PARALLEL_KEY_ENTRYPOINTUINT64CONST(0x0009)
 
 /* Fixed-size parallel state. */
 typedef struct FixedParallelState
@@ -77,9 +77,6 @@ typedef struct FixedParallelState
 	pid_t		parallel_master_pid;
 	BackendId	parallel_master_backend_id;
 
-	/* Entrypoint for parallel workers. */
-	parallel_worker_main_type entrypoint;
-
 	/* Mutex protects remaining fields. */
 	slock_t		mutex;
 
@@ -109,7 +106,6 @@ static dlist_head pcxt_list = DLIST_STATIC_INIT(pcxt_list);
 
 /* Private functions. */
 static void HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg);
-static void ParallelExtensionTrampoline(dsm_segment *seg, shm_toc *toc);
 static void WaitForParallelWorkersToExit(ParallelContext *pcxt);
 
 
@@ -119,7 +115,8 @@ static void WaitForParallelWorkersToExit(ParallelContext *pcxt);
  * destroyed before exiting the current subtransaction.
  */
 ParallelContext *
-CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
+CreateParallelContext(const char *library_name, const char *function_name,
+	  int nworkers)
 {
 	MemoryContext oldcontext;
 	ParallelContext *pcxt;
@@ -152,7 +149,8 @@ CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
 	pcxt = palloc0(sizeof(ParallelContext));
 	pcxt->subid = GetCurrentSubTransactionId();
 	pcxt->nworkers = nworkers;
-	pcxt->entrypoint = entrypoint;
+	pcxt->library_name = pstrdup(library_name);
+	pcxt->function_name = pstrdup(function_name);
 	pcxt->error_context_stack = error_context_stack;
 	shm_toc_initialize_estimator(&pcxt->estimator);
 	dlist_push_head(&pcxt_list, &pcxt->node);
@@ -164,33 +162,6 @@ CreateParallelContext(parallel_worker_main_type entrypoint, int nworkers)
 }
 
 /*
- * Establish a new parallel context that calls a function provided by an
- * extension.  This works around the fact that the library might get mapped
- * at a different address in each backend.
- */
-ParallelContext *
-CreateParallelContextForExternalFunction(char *library_name,
-		 char *function_name,
-		 int nworkers)
-{
-	MemoryContext oldcontext;
-	ParallelContext *pcxt;
-
-	/* We might be running in a very short-lived memory context. */
-	oldcontext = MemoryContextSwitchTo(TopTransactionContext);
-
-	/* Create the context. */
-	pcxt = CreateParallelContext(ParallelExtensionTrampoline, nworkers);
-	pcxt->library_name = pstrdup(library_name);
-	pcxt->function_name = pstrdup(function_name);
-
-	/* Restore previous memory context. */
-	MemoryContextSwitchTo(oldcontext);
-
-	return pcxt;
-}
-
-/*
  * Establish the dynamic shared memory segment for a parallel context and
  * copy state and other bookkeeping information that will be needed by
  * parallel workers into it.
@@ -249,15 +220,10 @@ InitializeParallelDSM(ParallelContext *pcxt)
 		pcxt->nworkers));
 		shm_toc_estimate_keys(&pcxt->estimator, 1);
 
-		/* Estimate how much we'll need for extension entrypoint info. */
-		if (pcxt->library_name != NULL)
-		{
-			Assert(pcxt->entrypoint == ParallelExtensionTrampoline);
-			Assert(pcxt->function_name != NULL);
-			shm_toc_estimate_chunk(&pcxt->estimator, strlen(pcxt->library_name)
-	

Re: [HACKERS] Compiler warning in costsize.c

2017-04-04 Thread Tom Lane
Michael Paquier  writes:
> In builds where USE_ASSERT_CHECKING is not enabled, costsize.c can
> generate warnings. Here is for example with MSVC:
>   src/backend/optimizer/path/costsize.c(4520): warning C4101: 'rte' : 
> unreferenced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]
>   src/backend/optimizer/path/costsize.c(4640): warning C4101: 'rte' : 
> unreferenced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj]

> a9c074ba7 has done an effort, but a bit more is needed as the attached.

That doesn't look right at all:

+#ifdef USE_ASSERT_CHECKING
RangeTblEntry *rte PG_USED_FOR_ASSERTS_ONLY;
+#endif

The entire point of the PG_USED_FOR_ASSERTS_ONLY annotation is to suppress
this type of warning without a need for an explicit #ifdef like that one.

We should either fix PG_USED_FOR_ASSERTS_ONLY to work on MSVC as well,
or decide that we don't care about suppressing such warnings on MSVC,
or give up on PG_USED_FOR_ASSERTS_ONLY and remove it everywhere in
favor of plain #ifdefs.

(I'm personally not that much in love with PG_USED_FOR_ASSERTS_ONLY,
because it tends to confuse pgindent.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Tomas Vondra

On 04/04/2017 06:52 PM, Robert Haas wrote:

On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh  wrote:

On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas  wrote:

On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
 wrote:

2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.


Hmm.  So this seems like the root of the problem.  Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.


IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.


While I'm not opposed to that approach, I don't think this is a good
way to implement it.  If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine.  But with what you've got here, you're
essentially relying on "spooky action at a distance".  It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.



I'm probably missing something, but I don't quite understand how these 
values actually survive the crash. I mean, what I observed is OOM 
followed by a restart, so shouldn't BackgroundWorkerShmemInit() simply 
reset the values back to 0? Or do we call ForgetBackgroundWorker() after 
the crash for some reason?



In any case, the comment right before BackgroundWorkerArray says this:

 * These counters can of course overflow, but it's not important here
 * since the subtraction will still give the right number.

which means that this assert

+   Assert(BackgroundWorkerData->parallel_register_count >=
+   BackgroundWorkerData->parallel_terminate_count);

is outright broken, just like any other attempts to rely on simple 
comparisons of these two counters, no?



BTW, if this isn't on the open items list, it should be.  It's
presumably the fault of b460f5d6693103076dc554aa7cbb96e1e53074f9.



Unrelated, but perhaps we should also add this to open items:

https://www.postgresql.org/message-id/flat/57bbc52c-5d40-bb80-2992-7e1027d0b67c%402ndquadrant.com

(although that's more a 9.6 material).

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-04-04 Thread Andres Freund
On 2017-04-04 13:06:13 +0300, Stas Kelvich wrote:
> That is just argument against Andres concern that prepared transaction
> is able to deadlock with decoding process — at least no such cases in
> regression tests.

There's few longer / adverse xacts, that doesn't say much.


> And that concern is main thing blocking this patch. Except explicit catalog
> locks in prepared tx nobody yet found such cases and it is hard to address
> or argue about.

I doubt that's the case. But even if it were so, it's absolutely not
acceptable that a plain user can cause such deadlocks. So I don't think
this argument buys you anything.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] identity columns

2017-04-04 Thread Tom Lane
Peter Eisentraut  writes:
> On 4/3/17 14:19, Andres Freund wrote:
> + *op->resvalue = 
> Int64GetDatum(nextval_internal(op->d.nextvalueexpr.seqid, false));

>> Is it guaranteed that the caller expects an int64?  I saw that
>> nextvalueexpr's have a typeid field.

> It expects one of the integer types.  We could cast the result of
> Int64GetDatum() to the appropriate type, but that wouldn't actually do
> anything.

Uh, really?  On 32-bit platforms, int64 and int32 datums have entirely
different representations.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-04-04 Thread Masahiko Sawada
On Tue, Apr 4, 2017 at 7:06 PM, Stas Kelvich  wrote:
>
>> On 4 Apr 2017, at 04:23, Masahiko Sawada  wrote:
>>
>>
>> I reviewed this patch but when I tried to build contrib/test_decoding
>> I got the following error.
>>
>
> Thanks!
>
> Yes, seems that 18ce3a4a changed ProcessUtility_hook signature.
> Updated.
>
>> There are still some unnecessary code in v5 patch.
>

Thank you for updating the patch!

> Actually second diff isn’t intended to be part of the patch, I've just shared
> the way I ran regression test suite through the 2pc decoding changing
> all commits to prepare/commits where commits happens only after decoding
> of prepare is finished (more details in my previous message in this thread).

Understood. Sorry for the noise.

>
> That is just argument against Andres concern that prepared transaction
> is able to deadlock with decoding process — at least no such cases in
> regression tests.
>
> And that concern is main thing blocking this patch. Except explicit catalog
> locks in prepared tx nobody yet found such cases and it is hard to address
> or argue about.
>

Hmm, I also has not found such deadlock case yet.

Other than that issue current patch still could not pass 'make check'
test of contrib/test_decoding.

*** 154,167 
  (4 rows)

  :get_with2pc
!   data
! -
!  BEGIN
!  table public.test_prepared1: INSERT: id[integer]:5
!  table public.test_prepared1: INSERT: id[integer]:6 data[text]:'frakbar'
!  PREPARE TRANSACTION 'test_prepared#3';
!  COMMIT PREPARED 'test_prepared#3';
! (5 rows)

  -- make sure stuff still works
  INSERT INTO test_prepared1 VALUES (8);
--- 154,162 
  (4 rows)

  :get_with2pc
!  data
! --
! (0 rows)

  -- make sure stuff still works
  INSERT INTO test_prepared1 VALUES (8);

I guess that the this part is a unexpected result and should be fixed. Right?

-

*** 215,222 
  -- If we try to decode it now we'll deadlock
  SET statement_timeout = '10s';
  :get_with2pc_nofilter
! -- FIXME we expect a timeout here, but it actually works...
! ERROR: statement timed out

  RESET statement_timeout;
  -- we can decode past it by skipping xacts with catalog changes
--- 210,222 
  -- If we try to decode it now we'll deadlock
  SET statement_timeout = '10s';
  :get_with2pc_nofilter
! data
! 
!  BEGIN
!  table public.test_prepared1: INSERT: id[integer]:10 data[text]:'othercol'
!  table public.test_prepared1: INSERT: id[integer]:11 data[text]:'othercol2'
!  PREPARE TRANSACTION 'test_prepared_lock'
! (4 rows)

  RESET statement_timeout;
  -- we can decode past it by skipping xacts with catalog changes

Probably we can ignore this part for now.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instead of DROP function use UPDATE pg_proc in an upgrade extension script

2017-04-04 Thread Pavel Stehule
2017-04-04 15:40 GMT+02:00 Vicky Vergara :

> Thanks,
>
>
> > It is not safe due views - that are saved in post analyze form.
>
>
> What is post analyze form? any link that you can give me to read about it?
>

The Query pipe line is: parsing, analyze, optimalization, execution

when you change a API, then the analyze stage should be processed again -
but views are stored as post analyzed  serialized data. You cannot do this
process again without source code.

Regards

Pavel

>
> Thanks
>
> --
> *De:* Pavel Stehule 
> *Enviado:* lunes, 3 de abril de 2017 11:21 p. m.
> *Para:* Vicky Vergara
> *Cc:* pgsql-hackers@postgresql.org
> *Asunto:* Re: [HACKERS] Instead of DROP function use UPDATE pg_proc in an
> upgrade extension script
>
>
>
> 2017-04-04 6:17 GMT+02:00 Vicky Vergara :
>
>>
>> Hello,
>>
>>
>> When creating an extension upgrade sql script, because the function does
>> not have the same parameter names and/or parameters type and/or the result
>> types changes, there is the need to drop the function because otherwise the
>> CREATE OR REPLACE of the new signature will fail.
>>
>>
>> So for example:
>>
>> having the following function:
>>
>>
>> SELECT proallargtypes, proargmodes, proargnames FROM pg_proc WHERE
>> proallargtypes = '{25,20,20,16,23,23,20,20}'   AND proname =
>> 'pgr_edgedisjointpaths';
>> -[ RECORD 1 ]--+
>> -
>> proallargtypes | {25,20,20,16,23,23,20,20}
>> proargmodes| {i,i,i,i,o,o,o,o}
>> proargnames| {"","","",directed,seq,path_seq,node,edge}
>>
>>
>> When adding extra OUT parameters, because the result types (&names)
>> change, the function needs a DROP:
>>
>> -- Row type defined by OUT parameters is different
>>
>>  ALTER EXTENSION pgrouting DROP FUNCTION pgr_edgedisjointpaths(text,big
>> int,bigint,boolean);
>>
>>  DROP FUNCTION IF EXISTS pgr_edgedisjointpaths(text,big
>> int,bigint,boolean);
>>
>>
>> but doing that, objects that depend on the function. like a view, get
>> dropped when using CASCADE in the ALTER extension, and functions that use
>> the pgr_edgedisjointpaths internally don't get dropped.
>>
>>
>> So, I must say that I experimented: instead of doing the drop, I made:
>>
>>
>> UPDATE pg_proc SET
>>
>>   proallargtypes = '{25,20,20,16,23,23,23,20,20,7
>> 01,701}',
>>
>>   proargmodes = '{i,i,i,i,o,o,o,o,o,o,o}',
>>
>>proargnames = '{"","","","directed","seq","p
>> ath_id","path_seq","node","edge","cost","agg_cost"}'
>>
>>  WHERE proallargtypes = '{25,20,20,16,23,23,20,20}'   AND proname =
>> 'pgr_edgedisjointpaths';
>>
>>
>> And CASCADE was not needed, and the view remained intact.
>>
>>
>> So, I want to know how "safe" can you consider the second method, and
>> what kind of other objects do I need to test besides views.
>>
>
> It is not safe due views - that are saved in post analyze form.
>
> Regards
>
> Pavel
>
>> My plan, is to use the second method:
>>
>> - when the current names of the OUT parameters don't change, and there is
>> an additional OUT parameter
>>
>> - when the current names of the IN parameters don't change, and there is
>> an additional IN parameter with a default value
>>
>>
>> Thanks
>>
>>
>> Vicky Vergara
>>
>>
>>
>>
>>
>>
>>
>>
>


Re: [HACKERS] pg_partman 3.0.0 - real-world usage of native partitioning and a case for native default

2017-04-04 Thread Keith Fiske
On Mon, Apr 3, 2017 at 11:33 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
>
>>>


 Thankfully since native partitioning still uses inheritance internally
 for the most part, pg_partman works pretty well without nearly as much
 change as I thought I would need. The biggest deficiency I'm seeing has to
 do with not having a "default" partition to put data that doesn't match any
 children. The fact that it throws an error is a concern, but it's not where
 I see the main problem. Where this really comes into play is when someone
 wants to make an existing table into a partitioned table. There's really no
 easy way to do this outside of making a completely brand new partition set
 and copying/moving the data from the old to the new.

>>>
>>> If there are multiple partitions, there is likely to be more data that
>>> needs to be moved that is retained in the old table. So, creating complete
>>> brand new partitioning and copying/moving data is annoying but not as much
>>> as it sounds. Obviously, if we could avoid it, we should try to.
>>>
>>
>> Not sure I follow what you're saying here. With pg_partman, making the
>> old table the parent and still containing all the data has caused no issues
>> when I've migrated clients to it, nor has anyone reported any issues to me
>> with this system. New data goes to the child tables as they should and old
>> data is then moved when convenient. It makes things work very smoothly and
>> the only outage encountered is a brief lock at creation time.
>>
>
> In partitioning, partitioned table doesn't have any storage. Data that
> belongs to a given partition is expected to be there from day one.
>
>
>> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>

I understand that with native, the parent has no data and never will. I'm
just using pg_partman's method as an example. The DEFAULT would take the
place of the parent table in this situation. Yes, in an ideal world, all
the data would be in the children right from the beginning when you declare
the parent. But that hardly seems realistic, especially when you have to
partition an existing billion+ row table and keep it running at the same
time. Yes, the basic purpose of the default is to catch data that gets
inserted outside the current child existence. That's what I also do with
the parent in pg_partman. But it can also serve as a method to ease
migration, as the parent also does in pg_partman's trigger-based method.

Keith


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Joe Conway
On 04/04/2017 09:55 AM, Mike Palmiotto wrote:
> On Tue, Apr 4, 2017 at 10:19 AM, Joe Conway  wrote:
>> On 04/04/2017 06:45 AM, Robert Haas wrote:
>>> On Mon, Apr 3, 2017 at 12:02 PM, Joe Conway  wrote:
> 0002 looks extremely straightforward, but I wonder if we could get one
> of the people on this list who knows about sepgsql to have a look?
> (Stephen Frost, Joe Conway, KaiGai Kohei...)

 Will have a look later today.
>>>
>>> I think it is now tomorrow, unless your time zone is someplace very
>>> far away.  :-)
>>
>> OBE -- I have scheduled time in 30 minutes from now, after I have gotten
>> my first cup of coffee ;-)
> 
> After some discussion off-list, I've rebased and udpated the patches.
> Please see attached for further review.

Thanks -- will have another look and test on a machine with selinux
setup. Robert, did you want me to take responsibility to commit on this
or just provide review/feedback?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-04 Thread Mike Palmiotto
On Tue, Apr 4, 2017 at 10:19 AM, Joe Conway  wrote:
> On 04/04/2017 06:45 AM, Robert Haas wrote:
>> On Mon, Apr 3, 2017 at 12:02 PM, Joe Conway  wrote:
 0002 looks extremely straightforward, but I wonder if we could get one
 of the people on this list who knows about sepgsql to have a look?
 (Stephen Frost, Joe Conway, KaiGai Kohei...)
>>>
>>> Will have a look later today.
>>
>> I think it is now tomorrow, unless your time zone is someplace very
>> far away.  :-)
>
> OBE -- I have scheduled time in 30 minutes from now, after I have gotten
> my first cup of coffee ;-)

After some discussion off-list, I've rebased and udpated the patches.
Please see attached for further review.

Thanks,
-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com
From be692f8cc94d74033494d140c310e932c705e785 Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Wed, 29 Mar 2017 14:59:37 +
Subject: [PATCH 2/2] Add partitioned table support to sepgsql

Account for RELKIND_PARTITIONED_RELATIONS in sepgsql and treat the objects like
regular relations. This allows for proper object_access hook behavior for
partitioned tables.
---
 contrib/sepgsql/label.c|  3 ++-
 contrib/sepgsql/relation.c | 33 +++--
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 8a72503..c66581a 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -787,7 +787,8 @@ exec_object_restorecon(struct selabel_handle * sehnd, Oid catalogId)
 			case RelationRelationId:
 relForm = (Form_pg_class) GETSTRUCT(tuple);
 
-if (relForm->relkind == RELKIND_RELATION)
+if (relForm->relkind == RELKIND_RELATION ||
+	relForm->relkind == RELKIND_PARTITIONED_TABLE)
 	objtype = SELABEL_DB_TABLE;
 else if (relForm->relkind == RELKIND_SEQUENCE)
 	objtype = SELABEL_DB_SEQUENCE;
diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c
index 4dc48a0..4fcebc1 100644
--- a/contrib/sepgsql/relation.c
+++ b/contrib/sepgsql/relation.c
@@ -54,12 +54,14 @@ sepgsql_attribute_post_create(Oid relOid, AttrNumber attnum)
 	ObjectAddress object;
 	Form_pg_attribute attForm;
 	StringInfoData audit_name;
+	char		relkind;
 
 	/*
 	 * Only attributes within regular relation have individual security
 	 * labels.
 	 */
-	if (get_rel_relkind(relOid) != RELKIND_RELATION)
+	relkind = get_rel_relkind(relOid);
+	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
 		return;
 
 	/*
@@ -135,8 +137,10 @@ sepgsql_attribute_drop(Oid relOid, AttrNumber attnum)
 {
 	ObjectAddress object;
 	char	   *audit_name;
+	char		relkind;
 
-	if (get_rel_relkind(relOid) != RELKIND_RELATION)
+	relkind = get_rel_relkind(relOid);
+	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
 		return;
 
 	/*
@@ -167,8 +171,11 @@ sepgsql_attribute_relabel(Oid relOid, AttrNumber attnum,
 {
 	ObjectAddress object;
 	char	   *audit_name;
+	char		relkind;
 
-	if (get_rel_relkind(relOid) != RELKIND_RELATION)
+	relkind = get_rel_relkind(relOid);
+
+	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("cannot set security label on non-regular columns")));
@@ -209,8 +216,11 @@ sepgsql_attribute_setattr(Oid relOid, AttrNumber attnum)
 {
 	ObjectAddress object;
 	char	   *audit_name;
+	char		relkind;
+
+	relkind = get_rel_relkind(relOid);
 
-	if (get_rel_relkind(relOid) != RELKIND_RELATION)
+	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
 		return;
 
 	/*
@@ -290,6 +300,7 @@ sepgsql_relation_post_create(Oid relOid)
 
 	switch (classForm->relkind)
 	{
+		case RELKIND_PARTITIONED_TABLE:
 		case RELKIND_RELATION:
 			tclass = SEPG_CLASS_DB_TABLE;
 			break;
@@ -336,7 +347,8 @@ sepgsql_relation_post_create(Oid relOid)
   true);
 
 	/*
-	 * Assign the default security label on the new relation
+	 * Assign the default security label on the new relation or partitioned
+	 * table.
 	 */
 	object.classId = RelationRelationId;
 	object.objectId = relOid;
@@ -344,10 +356,10 @@ sepgsql_relation_post_create(Oid relOid)
 	SetSecurityLabel(&object, SEPGSQL_LABEL_TAG, rcontext);
 
 	/*
-	 * We also assigns a default security label on columns of the new regular
-	 * tables.
+	 * We also assign a default security label on columns of a new table.
 	 */
-	if (classForm->relkind == RELKIND_RELATION)
+	if (classForm->relkind == RELKIND_RELATION ||
+		classForm->relkind == RELKIND_PARTITIONED_TABLE)
 	{
 		Relation	arel;
 		ScanKeyData akey;
@@ -422,6 +434,7 @@ sepgsql_relation_drop(Oid relOid)
 	relkind = get_rel_relkind(relOid);
 	switch (relkind)
 	{
+		case RELKIND_PARTITIONED_TABLE:
 		case RELKIND_RELATION:
 			tclass = SEPG_CLASS_DB_TABLE;
 			break;
@@ -485,7 +498,7 @@ sepgsql_relation_drop(Oid relOid)
 	/*
 	 * check db_column:{drop} permission
 	 */
-	if (relkind == RELKIND_RELATION)
+	if (relkind == RE

Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-04-04 Thread Ashutosh Sharma
Hi,

As I am not seeing any response from Tomas for last 2-3 days and since
the commit-fest is coming towards end, I have planned to work on the
review comments that I had given few days back and submit the updated
patch. PFA new version of patch that takes care of all the review
comments given by me. I have also ran pgindent on btreefuncs.c file
and have run some basic test cases. All looks fine to me now!

Please note that this patch still belongs to Tomas not me. I still
remain the reviewer of this patch. The same thing has been very
clearly mentioned in the attached patch. The only intention behind
Ashutosh (reviewer) working on this patch is to ensure that we do not
miss the things that can easily get committed in this commit-fest.
Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

On Tue, Apr 4, 2017 at 7:23 PM, David Steele  wrote:
> On 4/4/17 9:43 AM, Robert Haas wrote:
>> On Tue, Apr 4, 2017 at 9:32 AM, David Steele  wrote:
>>> My goal is to help people focus on patches that have a chance.  At this
>>> point I think that includes poking authors who are not being responsive
>>> using the limited means at my disposal.
>>
>> +1.  Pings on specific threads can help clear things up when, for
>> example, the author and reviewer are each waiting for the other.  And,
>> as you say, they also help avoid the situation where a patch just
>> falls off the radar and misses the release for no especially good
>> reason, which naturally causes people frustration.
>>
>> I think your pings have been quite helpful, although I think it would
>> have been better in some cases if you'd done them sooner.  Pinging
>> after a week, with a 3 day deadline, when there are only a few days
>> left in the CommitFest isn't really leaving a lot of room to maneuver.
>
> Thanks for the feedback!  My thinking is that I don't want to bug people
> too soon, but there's a maximum number of days a thread should be idle.
> Over the course of the CF that has gone from 10 days, to 7, 5, and 3.
>
> I don't look at all patches every day so it can be a bit uneven, i.e.,
> all patches are allowed certain amount of idle time, but some may get a
> bit more depending on when I check up on them.
>
> Thanks,
> --
> -David
> da...@pgmasters.net
From 6d9814182eb03521f093d687eb8024c9df1c11d5 Mon Sep 17 00:00:00 2001
From: ashu 
Date: Tue, 4 Apr 2017 21:38:42 +0530
Subject: [PATCH] pageinspect: Add bt_page_items function with bytea v6

Author: Tomas Vondra 
Reviewed-by: Ashutosh Sharma 
---
 contrib/pageinspect/btreefuncs.c  | 198 +++---
 contrib/pageinspect/expected/btree.out|  13 ++
 contrib/pageinspect/pageinspect--1.5--1.6.sql |  14 ++
 contrib/pageinspect/sql/btree.sql |   4 +
 doc/src/sgml/pageinspect.sgml |  31 
 src/include/access/nbtree.h   |   1 +
 6 files changed, 210 insertions(+), 51 deletions(-)

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 3d69aa9..02440ec 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,6 +41,7 @@
 
 PG_FUNCTION_INFO_V1(bt_metap);
 PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
 PG_FUNCTION_INFO_V1(bt_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -235,14 +236,6 @@ bt_page_stats(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(result);
 }
 
-/*---
- * bt_page_items()
- *
- * Get IndexTupleData set in a btree page
- *
- * Usage: SELECT * FROM bt_page_items('t1_pkey', 1);
- *---
- */
 
 /*
  * cross-call data structure for SRF
@@ -253,14 +246,72 @@ struct user_args
 	OffsetNumber offset;
 };
 
+/*---
+ * bt_page_print_tuples()
+ *
+ * Form a tuple describing index tuple at a given offset
+ * --
+ */
+static Datum
+bt_page_print_tuples(FuncCallContext *fctx, Page page, OffsetNumber offset)
+{
+	char	   *values[6];
+	HeapTuple	tuple;
+	ItemId		id;
+	IndexTuple	itup;
+	int			j;
+	int			off;
+	int			dlen;
+	char	   *dump;
+	char	   *ptr;
+
+	id = PageGetItemId(page, offset);
+
+	if (!ItemIdIsValid(id))
+		elog(ERROR, "invalid ItemId");
+
+	itup = (IndexTuple) PageGetItem(page, id);
+
+	j = 0;
+	values[j++] = psprintf("%d", offset);
+	values[j++] = psprintf("(%u,%u)",
+		   ItemPointerGetBlockNumberNoCheck(&itup->t_tid),
+		   ItemPointerGetOffsetNumberNoCheck(&itup->t_tid));
+	values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
+	values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+	values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+	ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+	dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+	dump = palloc0(dlen * 3 + 1);
+	values[j] = dump;
+	

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Robert Haas
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh  wrote:
> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas  wrote:
>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>>  wrote:
>>> 2. the server restarts automatically, initialize
>>> BackgroundWorkerData->parallel_register_count and
>>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>>> After that, it calls ForgetBackgroundWorker and it increments
>>> parallel_terminate_count.
>>
>> Hmm.  So this seems like the root of the problem.  Presumably those
>> things need to be reset AFTER forgetting any background workers from
>> before the crash.
>>
> IMHO, the fix would be not to increase the terminated parallel worker
> count whenever ForgetBackgroundWorker is called due to a bgworker
> crash. I've attached a patch for the same. PFA.

While I'm not opposed to that approach, I don't think this is a good
way to implement it.  If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine.  But with what you've got here, you're
essentially relying on "spooky action at a distance".  It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.

BTW, if this isn't on the open items list, it should be.  It's
presumably the fault of b460f5d6693103076dc554aa7cbb96e1e53074f9.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequence data type

2017-04-04 Thread Peter Eisentraut
On 3/30/17 22:47, Vitaly Burovoy wrote:
> It seemed not very hard to fix it.
> Please find attached patch to be applied on top of your one.
> 
> I've added more tests to cover different cases of changing bounds when
> data type is changed.

Committed all that.  Thanks!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ANALYZE command progress checker

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote
 wrote:
> Hmm, you're right.  It could be counted with a separate variable
> initialized to 0 and incremented every time we decide to add a row to the
> final set of sampled rows, although different implementations of
> AcquireSampleRowsFunc have different ways of deciding if a given row will
> be part of the final set of sampled rows.
>
> On the other hand, if we decide to count progress in terms of blocks as
> you suggested afraid, I'm afraid that FDWs won't be able to report the
> progress.

I think it may be time to push this patch out to v11.  It was
submitted one day before the start of the last CommitFest, the design
wasn't really right, and it's not clear even now that we know what the
right design is.  And we're pretty much out of time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instead of DROP function use UPDATE pg_proc in an upgrade extension script

2017-04-04 Thread Robert Haas
On Tue, Apr 4, 2017 at 9:07 AM, Vicky Vergara  wrote:
> you answered so fast that I know I am stepping into dangerous grounds.
>
> But I would like to know more about your experience.
>
> Any links that you can give me to read about the code and/or issues
> regarding the ip4r experience?

I can't comment on that, but in general I don't think there's an issue
if (1) your UPDATE statement contains no bugs and (2) the DROP
statement would have succeeded without CASCADE.  The problem is when
there's stuff depending on the existing function definition - such as
views.  Even then it may work if the dependencies are such that the
new definition is compatible enough for purposes of the dependent
objects, but if not then you've got trouble.

To put this another way, if it were safe for CREATE OR REPLACE to
succeed here, we would have made it succeed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-04-04 Thread David Steele
On 4/4/17 11:42 AM, Stephen Frost wrote:
> * David Steele (da...@pgmasters.net) wrote:
>> On 3/22/17 4:42 PM, Peter Eisentraut wrote:
>>> On 3/22/17 15:14, Stephen Frost wrote:
> -SELECT * FROM pg_stop_backup(false);
> +SELECT * FROM pg_stop_backup(false [, true ]);
>
> I think that it's better to get rid of "[" and "]" from the above because
> IMO this should be the command example that users actually can run.
 Using the '[' and ']' are how all of the optional arguments are
 specified in the documentation, see things like current_setting() in our
 existing documentation:
>>>
>>> In the synopsis, but not in concrete examples.
>>
>> Wasn't quite sure what the protocol was in this case, and figured it
>> was easier to subtract than to add.
> 
> Forgot to close this out sooner, apologies.
> 
> This has been committed now.

Thank you, Stephen!

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-04-04 Thread Maksim Milyutin

On 01.03.2017 13:53, Maksim Milyutin wrote:

Hi hackers!

As I've understood from thread [1] the main issue of creating local
indexes for partitions is supporting REINDEX and DROP INDEX operations
on parent partitioned tables. Furthermore Robert Haas mentioned the
problem of creating index on key that is represented in partitions with
single value (or primitive interval) [1] i.e. under the
list-partitioning or range-partitioning with unit interval.

I would like to propose the following solution:

1. Create index for hierarchy of partitioned tables and partitions
recursively. Don't create relfilenode for indexes on parents, only
entries in catalog (much like the partitioned table's storage
elimination in [2]). Abstract index for partitioned tables is only for
the reference on indexes of child tables to perform REINDEX and DROP
INDEX operations.

2. Specify created indexes in pg_depend table so that indexes of child
tables depend on corresponding indexes of parent tables with type of
dependency DEPENDENCY_NORMAL so that index could be removed separately
for partitions and recursively/separately for partitioned tables.

3. REINDEX on index of partitioned table would perform this operation on
existing indexes of corresponding partitions. In this case it is
necessary to consider such operations as REINDEX SCHEMA | DATABASE |
SYSTEM so that partitions' indexes wouldn't be re-indexed multiple times
in a row.

Any thoughts?

1.
https://www.postgresql.org/message-id/CA+TgmoZUwj=qynak+f7xef4w_e2g3xxdmnsnzmzjuinhrco...@mail.gmail.com

2.
https://www.postgresql.org/message-id/2b0d42f2-3a53-763b-c9c2-47139e4b1c2e%40lab.ntt.co.jp




I want to present the first version of patches that implement local 
indexes for partitioned tables and discuss some technical details of 
that implementation.



1. I have added a new relkind for local indexes named 
RELKIND_LOCAL_INDEX (literal 'l').


This was done because physical storage is created in the 'heap_create' 
function and we need to revoke the creating storage as with partitioned 
tables. Since information that this index belongs to partitioned tables 
is not available in 'heap_create' function (pg_index entry on the index 
is not created yet) I chose the least painful way - added a specific 
relkind for index on partitioned table.
I suppose that this act will require the integrating new relkind to 
different places of source code so I'm ready to consider another 
proposals on this point.


2. My implementation doesn't support the concurrent creating of local 
index (CREATE INDEX CONCURRENTLY). As I understand, this operation 
involves nontrivial manipulation with snapshots and I don't know how to 
implement concurrent creating of multiple indexes. In this point I ask 
help from community.


3. As I noticed early pg_depend table is used for cascade deleting 
indexes on partitioned table and its children. I also use pg_depend to 
determine relationship between parent and child indexes when reindex 
executes recursively on child indexes.


Perhaps, it's not good way to use pg_depend to determine the 
relationship between parent and child indexes because the kind of this 
relationship is not defined. I could propose to add into pg_index table 
specific field of 'oidvector' type that specify oids of dependent 
indexes for the current local index.



On this stage I want to discuss only technical details of local indexes' 
implementation. The problems related to merging existing indexes of 
partitions within local index tree, determination uniqueness of field in 
global sense through local index and syntax notes I want to arise later.



CC welcome!

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index cc5ac8b..bec3983 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -154,7 +154,8 @@ index_open(Oid relationId, LOCKMODE lockmode)
 
 	r = relation_open(relationId, lockmode);
 
-	if (r->rd_rel->relkind != RELKIND_INDEX)
+	if (r->rd_rel->relkind != RELKIND_INDEX &&
+		r->rd_rel->relkind != RELKIND_LOCAL_INDEX)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("\"%s\" is not an index",
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fc088b2..26a10e9 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1107,7 +1107,8 @@ doDeletion(const ObjectAddress *object, int flags)
 			{
 char		relKind = get_rel_relkind(object->objectId);
 
-if (relKind == RELKIND_INDEX)
+if (relKind == RELKIND_INDEX ||
+	relKind == RELKIND_LOCAL_INDEX)
 {
 	bool		concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY) != 0);
 
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 36917c8..91ac740 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -293,6 +293,7 @@ heap_create(c

Re: [HACKERS] Making clausesel.c Smarter

2017-04-04 Thread Claudio Freire
On Tue, Apr 4, 2017 at 8:21 AM, David Rowley
 wrote:
> On 4 April 2017 at 13:35, Claudio Freire  wrote:
>> On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire  
>> wrote:
>>> On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
>>>  wrote:
> One last observation:
>
> +/*
> + * An IS NOT NULL test is a no-op if there's any other strict 
> quals,
> + * so if that's the case, then we'll only apply this, otherwise 
> we'll
> + * ignore it.
> + */
> +else if (cslist->selmask == CACHESEL_NOTNULLTEST)
> +s1 *= cslist->notnullsel;
>
> In some paths, the range selectivity estimation code punts and returns
> DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
> present, it should still be applied. It could make a big difference if
> the selectivity for the nulltest is high.

 I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
 should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
 to exists to estimate that properly. I don't think that needs done as
 part of this patch. I'd rather limit the scope since "returned with
 feedback" is already knocking at the door of this patch.
>>>
>>> I agree, but this patch regresses for those cases if the user
>>> compensated with a not null test, leaving little recourse, so I'd fix
>>> it in this patch to avoid the regression.
>>>
>>> Maybe you're right that the null fraction should be applied regardless
>>> of whether there's an explicit null check, though.
>>
>> The more I read that code, the more I'm convinced there's no way to
>> get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL
>> &&
>>  rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory
>> clauses, a case the current code handles very poorly, returning
>> DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could
>> give way-off estimates if the table had lots of nulls.
>>
>> Say, consider if "value" was mostly null and you write:
>>
>> SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN
>> 50 AND 60;
>>
>> All other cases I can think of would end with either hibound or
>> lobound being equal to DEFAULT_INEQ_SEL.
>>
>> It seems to me, doing a git blame, that the checks in the else branch
>> were left only as a precaution, one that seems unnecessary.
>>
>> If you ask me, I'd just leave:
>>
>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>> DEFAULT_INEQ_SEL)
>> + {
>> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>> implies we have no stats */
>> + s2 = DEFAULT_RANGE_INEQ_SEL;
>> + }
>> + else
>> + {
>> ...
>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>> +   notnullsel = 1.0 - nullsel;
>> +
>> +   /* Adjust for double-exclusion of NULLs */
>> +   s2 += nullsel;
>> +   if (s2 <= 0.0) {
>> +  if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>> +  {
>> +   /* Most likely contradictory clauses found */
>> +   s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>> +  }
>> +  else
>> +  {
>> +   /* Could be a rounding error */
>> +   s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>> +  }
>> +   }
>> + }
>>
>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>> cautious) estimation of the amount of rounding error that could be
>> there with 32-bit floats.
>>
>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>> an estimation based on stats, maybe approximate, but one that makes
>> sense (ie: preserves the monotonicity of the CPF), and as such
>> negative values are either sign of a contradiction or rounding error.
>> The previous code left the possibility of negatives coming out of
>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>> but that doesn't seem possible coming out of scalarineqsel.
>>
>> But I'd like it if Tom could comment on that, since he's the one that
>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>> in 2004).
>>
>> Barring that, I'd just change the
>>
>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>
>> To
>>
>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>
>> Which makes a lot more sense.
>
> I think to fix this properly the selfuncs would have to return the
> estimate, and nullfrac separately, so the nullfrac could just be
> applied once per expression. There's already hacks in there to get rid
> of the double null filtering. I'm not proposing that as something for
> this patch. It would be pretty invasive to change this.

There's no need, you can compute the nullfrac with nulltestsel. While
there's some rework involved, it's not a big deal (it just reads the
stats tuple), and it's a clean solution.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-h

Re: [HACKERS] Making clausesel.c Smarter

2017-04-04 Thread Claudio Freire
On Tue, Apr 4, 2017 at 8:12 AM, David Rowley
 wrote:
> Result Comparison
>
> Master median tps Patch median tps comparison
> Test 1 6993.7 6714.3 104.16%
> Test 2 7053.1 6921.6 101.90%
> Test 3 5137.2 4954.2 103.69%
> Test 4 27.1 19.4 139.72%
> Test 5 54.1 51.4 105.28%
> Test 6 9328.1 9478.2 98.42%
>
> Results Analyzed:
>
> Test 1 has caused planning to slow down 4.16%. There's quite a bit of
> noise from the results, but I think this indicates there is some
> overhead to having to add items to the cslist and searching the cslist
> when new quals are seen.
>
> Test 2 has a lesser slowdown than test 1, as this test will excercise
> the existing rqlist caching in master too. Patched does a little more
> work adding the equality condition to the list too.
>
> Test 3 similar to test 1
>
> Test 4 adds quite an overhead and causes 0.5 million comparisons to
> find the expressions in the cslist.
>
> Test 5 shows less overhead than test 4 since the Master code has to
> also do the expression caching and searching.
>
> Test 6 is a control test

That's consistent with the operation being quadratic.

While I was about to point it out, the old code was quadratic too, so
it sucks in both versions. This version only adds other opexprs to the
list and makes the quadratic cost easier to run into, but it's nothing
new.

I don't think there's an easy fix for that. You'd have to build a hash
table of expression nodes or something of the sort to avoid the
quadratic cost. If you can do that, by all means, but that's a much
bigger patch.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw: support parameterized foreign joins

2017-04-04 Thread Arthur Zakirov

On 23.03.2017 15:45, Etsuro Fujita wrote:


Done.  Also, I added regression tests and revised code and comments a
bit.  (As for create_foreignscan_path(), I haven't done anything about
that yet.)  Please find attached a new version created on top of [1].


Thank you!

I didn't notice that it is necessary to apply the patch 
"epqpath-for-foreignjoin".


I have a few comments.


   * innersortkeys are the sort pathkeys for the inner side of the mergejoin
+  * req_outer_list is a list of sensible parameterizations for the join rel
   */


I think it would be better if the comment explained what type is stored 
in req_outer_list. So the following comment would be good:


"req_outer_list is a list of Relids of sensible parameterizations for 
the join rel"




!   Assert(foreignrel->reloptkind == RELOPT_JOINREL);
!


Here the new macro IS_JOIN_REL() can be used.


!   /* Get the remote and local conditions */
!   remote_conds = 
list_concat(list_copy(remote_param_join_conds),
!  
fpinfo->remote_conds);
!   local_param_join_exprs =
!   get_actual_clauses(local_param_join_conds);
!   local_exprs = 
list_concat(list_copy(local_param_join_exprs),
! 
fpinfo->local_conds);


Is this code correct? 'remote_conds' and 'local_exprs' are initialized 
above when 'scan_clauses' is separated. Maybe better to use 
'remote_conds' and 'local_exprs' instead of 'fpinfo->remote_conds' and 
'fpinfo->local_conds' respectively?


And the last. The patch needs rebasing because new macroses 
IS_JOIN_REL() and IS_UPPER_REL() were added. And the patch is applied 
with errors.


--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Make pg_stop_backup() archive wait optional

2017-04-04 Thread Stephen Frost
* David Steele (da...@pgmasters.net) wrote:
> On 3/22/17 4:42 PM, Peter Eisentraut wrote:
> >On 3/22/17 15:14, Stephen Frost wrote:
> >>>-SELECT * FROM pg_stop_backup(false);
> >>>+SELECT * FROM pg_stop_backup(false [, true ]);
> >>>
> >>>I think that it's better to get rid of "[" and "]" from the above because
> >>>IMO this should be the command example that users actually can run.
> >>Using the '[' and ']' are how all of the optional arguments are
> >>specified in the documentation, see things like current_setting() in our
> >>existing documentation:
> >
> >In the synopsis, but not in concrete examples.
> 
> Wasn't quite sure what the protocol was in this case, and figured it
> was easier to subtract than to add.

Forgot to close this out sooner, apologies.

This has been committed now.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity

2017-04-04 Thread Stephen Frost
Greetings,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
> On 31 March 2017 at 04:29, Stephen Frost  wrote:
> 
> > Unless people wish to object, I'll use Michael's patch to remove
> > --verbose from the top level tomorrow.
> 
> Sounds good.
> 
> Maybe add
> 
> To get more detailed output from tests set PROVE_FLAGS='--verbose' or examine
> the detailed test output in tmp_check/logs.
> 
> to src/test/perl/README too?

Committed, with those additions.

Thanks!

Stephen


signature.asc
Description: Digital signature


  1   2   >