date:20170309

Re: [HACKERS] Explicit subtransactions for PL/Tcl

2017-03-09 Thread Pavel Stehule

2017-03-09 7:48 GMT+01:00 Victor Wagner :

> On Wed, 8 Mar 2017 16:49:33 +0100
> Pavel Stehule  wrote:
>
> >
> > I did a review of this patch
> >
> I'm attaching new version of patch with the issues pointed by you fixed.
>
> >
> > 4. A documentation is good - although I am not sure if it is well
> > structured - is  level necessary? Probably there will not be
> > any other similar command.
>
> Removed . Although it is questionable - now there is no
> subsection with name of the command in the title. But surrounding
> section describes only one command,.
>
>
>
> > 5. There are a basic regress tests, and all tests passed, but I miss a
> > path, where subtransaction is commited - now rollback is every time
>
> Added test with successful subtransaction commit. Just to be sure it is
> really commited.
>
>
> > 6. The code has some issues with white chars
>
> Fixed where I've found it. Now, hopefully my code uses same identation
> rules as other pltcl.c
>
>
> > 7. I don't understand why TopMemoryContext is used there? Maybe some
> > already used context should be there.
> >
> > +<->BeginInternalSubTransaction(NULL);
> > +<->MemoryContextSwitchTo(TopTransactionContext);
> > +<->
>
> It turns out that was really an error in the first version of patch.
> This context manipulation was copied from PL/Python context manager by
> mistake.
>
> PL/Python changes to TopTransactionContext in order to manage
> stack of subtransaction data (which we don't need in Tcl, because we
> can use C language stack and store neccessary data in function
> automatic variables. Really python doesn't need this stack to, because
> Python context manager is a python language object and can keep state
> inside it).
>
> Although we still need to keep MemoryContext and ResourceOwner and
> restore them upon  transaction finish.
>
>
is this patch complete? I don't see new regress tests

Regards

Pavel


> With best regards, Victor
>
> --
> Victor Wagner 
>

Re: [HACKERS] Explicit subtransactions for PL/Tcl

2017-03-09 Thread Victor Wagner

On Thu, 9 Mar 2017 09:25:14 +0100
Pavel Stehule  wrote:


> >
> is this patch complete? I don't see new regress tests

Oh, really! I've forgot that git diff doesn't include files which are
not added into git.

So, no old regress tests as well.

Sorry for posting incomplete patch.

Attached fixed version of patch with regress tests and couple more
whitespace issues fixed.

 With best regards, Victor

--
Victor Wagner 


diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index ad216dd..87bc4ad 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -901,6 +901,79 @@ if {[catch { spi_exec $sql_command }]} {
  is a global variable.)
 

+   
+   Explicit Subtransactions
+  
+   Recovering from errors caused by database access as described in
+can lead to an undesirable
+   situation where some operations succeed before one of them fails,
+   and after recovering from that error the data is left in an
+   inconsistent state.  PL/Tcl offers a solution to this problem in
+   the form of explicit subtransactions.
+  
+   
+Consider a function that implements a transfer between two
+accounts:
+
+CREATE FUNCTION transfer_funds() RETURNS void AS $$
+if [catch {
+spi_exec "UPDATE accounts SET balance = balance - 100 WHERE account_name = 'joe'"
+spi_exec "UPDATE accounts SET balance = balance + 100 WHERE account_name = 'mary'"
+} errormsg] {
+set result [format "error transferring funds: %s" $errormsg ]
+} else {
+set result "funds transferred correctly"
+}
+set plan [spi_prepare "INSERT INTO operations (result) VALUES ($1)"]
+spi_execp -count 1 $plan, [list $result)
+$$ LANGUAGE pltclu;
+
+If the second UPDATE statement results in an
+exception being raised, this function will report the error, but
+the result of the first UPDATE will
+nevertheless be committed.  In other words, the funds will be
+withdrawn from Joe's account, but will not be transferred to
+Mary's account.
+   
+   
+To avoid such issues, you can wrap your
+spi_exec calls in an explicit
+subtransaction.  The PL/Tcl provides a
+commmand subtransaction to manage explicit
+	subtransactions.
+ Using explicit subtransactions
+we can rewrite our function as:
+
+CREATE FUNCTION transfer_funds2() RETURNS void AS $$
+if [catch {
+	subtransaction {
+spi_exec "UPDATE accounts SET balance = balance - 100 WHERE account_name = 'joe'"
+spi_exec "UPDATE accounts SET balance = balance + 100 WHERE account_name = 'mary'"
+	}
+	} errormsg] {
+		set result [format "error transferring funds: %s" $errormsg]
+	} else {
+   set result "funds transferred correctly"
+	}
+set plan  [spi_prepare "INSERT INTO operations (result) VALUES ($1)"]
+spi_execp $plan, [list $result]
+$$ LANGUAGE pltclu;
+
+Note that the use of catch is still
+required.  Otherwise the exception would propagate to the top of
+the  stack and would cause the whole function to abort with
+a PostgreSQL error, so that the
+operations table would not have any row
+inserted into it.  The subtransaction command does not
+trap errors, it only assures that all database operations executed
+inside its scope will be atomically committed or rolled back.  A
+rollback of the subtransaction block occurs on any kind of
+exception exit, not only ones caused by errors originating from
+database access.  A regular Tcl exception raised inside an
+explicit subtransaction block would also cause the subtransaction
+to be rolled back.
+   
+  
 

 PL/Tcl Configuration
diff --git a/src/pl/tcl/Makefile b/src/pl/tcl/Makefile
index 1096c4f..b6b6b19 100644
--- a/src/pl/tcl/Makefile
+++ b/src/pl/tcl/Makefile
@@ -28,7 +28,7 @@ DATA = pltcl.control pltcl--1.0.sql pltcl--unpackaged--1.0.sql \
pltclu.control pltclu--1.0.sql pltclu--unpackaged--1.0.sql
 
 REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-extension=pltcl
-REGRESS = pltcl_setup pltcl_queries pltcl_start_proc pltcl_unicode
+REGRESS = pltcl_setup pltcl_queries pltcl_start_proc pltcl_unicode pltcl_subxact
 
 # Tcl on win32 ships with import libraries only for Microsoft Visual C++,
 # which are not compatible with mingw gcc. Therefore we need to build a
diff --git a/src/pl/tcl/expected/pltcl_subxact.out b/src/pl/tcl/expected/pltcl_subxact.out
new file mode 100644
index 000..5d984f1
--- /dev/null
+++ b/src/pl/tcl/expected/pltcl_subxact.out
@@ -0,0 +1,171 @@
+--
+-- Test explicit subtransactions
+--
+CREATE TABLE subtransaction_tbl (
+i integer
+);
+-- test subtransaction successfully commited
+CREATE FUNCTION subtransaction_ctx_success() RETURNS text
+AS $$
+	spi_exec "INSERT INTO subtransaction_tbl VALUES(1)"
+subtransaction {
+		spi_exec "INSERT INTO subtransaction_tbl VALUES(2)"
+	}
+$$ LANGUAGE pltcl;
+BEGIN;
+INSERT INTO subtransaction_tbl VALUES(0);
+SELECT subtransaction_ctx_success();
+ subtransaction_ctx_success 
+

Re: [HACKERS] Explicit subtransactions for PL/Tcl

2017-03-09 Thread Pavel Stehule

Hi

2017-03-09 10:25 GMT+01:00 Victor Wagner :

> On Thu, 9 Mar 2017 09:25:14 +0100
> Pavel Stehule  wrote:
>
>
> > >
> > is this patch complete? I don't see new regress tests
>
> Oh, really! I've forgot that git diff doesn't include files which are
> not added into git.
>
> So, no old regress tests as well.
>
> Sorry for posting incomplete patch.
>
> Attached fixed version of patch with regress tests and couple more
> whitespace issues fixed.
>

the regress tests is unstable

the proc name has mutable pid

! (procedure "__PLTcl_proc_16503" line 3)
  invoked from within
! "__PLTcl_proc_16503 SPI"

Regards

Pavel


>  With best regards, Victor
>
> --
> Victor Wagner 
>
>
>

Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Erik Rijkers


On 2017-03-09 11:06, Erik Rijkers wrote:



file Name:
logrep.20170309_1021.1.1043.scale_25.clients_64.NOK.log

20170309_1021 is the start-time of the script
1  is master (2 is replica)
1043 is the time, 10:43, just before the pg_waldump call


Sorry, that might be confusing.  That 10:43 is the time when script 
renames and copies the logfiles (not the waldump)



I meant to show the name of the waldump file:

waldump.20170309_1021_1039.1.5.000100270069.txt.bz2
where:

20170309_1021 is the start-time of the script
1  is master (2 is replica)
5 is wait-state cycles during which all 8 md5s remained the same
1039 is the time, 10:43, just before the pg_waldump call



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Erik Rijkers


On 2017-03-09 11:06, Erik Rijkers wrote:

On 2017-03-08 10:36, Petr Jelinek wrote:

On 07/03/17 23:30, Erik Rijkers wrote:

On 2017-03-06 11:27, Petr Jelinek wrote:


0001-Reserve-global-xmin-for-create-slot-snasphot-export.patch +
0002-Don-t-use-on-disk-snapshots-for-snapshot-export-in-l.patch+
0003-Prevent-snapshot-builder-xmin-from-going-backwards.patch  +
0004-Fix-xl_running_xacts-usage-in-snapshot-builder.patch  +
0005-Skip-unnecessary-snapshot-builds.patch+
0001-Logical-replication-support-for-initial-data-copy-v6.patch




The attached bz2 contains
- an output file from pgbench_derail2.sh (also attached, as it changes
somewhat all the time); the

- the pg_waldump output from both master (file with .1. in it) and
replica (.2.).

- the 2 logfiles.




I forgot to include the bash-output file.  Now attached.  This file 
should have been in the bz2 I sent a few minutes ago.



= iteration 1 --   1 of 10 =

-- scale 25   clients 64   duration 300   CLEAN_ONLY=

-- hostname: barzoi
-- timestamp: 20170309_1021
-- master_start_time 2017-03-08 12:04:02.127127+01 replica_start_time 
2017-03-08 12:04:02.12713+01
-- master  patch-md5 [59c92165d4a328d68450ef0e922c0a42]
-- replica patch-md5 [59c92165d4a328d68450ef0e922c0a42] (ok)
-- synchronous_commit, master  [on]  replica [off]
-- master_assert  [on]  replica_assert [on]
-- self md5 87554cfed7cda67ad292b6481e1b8b41 ./pgbench_derail2.sh
clean-at-start-call
creating tables...
1699900 of 250 tuples (67%) done (elapsed 5.19 s, remaining 2.44 s)
250 of 250 tuples (100%) done (elapsed 7.51 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
create publication pub1 for all tables;
create subscription sub1 connection 'port=6972' publication pub1 with 
(disabled);
alter subscription sub1 enable;
-- pgbench -c 64 -j 8 -T 300 -P 60 -n   --  scale 25
progress: 60.0 s, 134.4 tps, lat 472.280 ms stddev 622.992
progress: 120.0 s, 26.4 tps, lat 2083.748 ms stddev 4356.546
progress: 180.0 s, 21.2 tps, lat 2977.751 ms stddev 4767.332
progress: 240.0 s, 13.5 tps, lat 5230.657 ms stddev 7029.718
progress: 300.0 s, 42.4 tps, lat 1555.645 ms stddev 1733.152
transaction type: 
scaling factor: 25
query mode: simple
number of clients: 64
number of threads: 8
duration: 300 s
number of transactions actually processed: 14336
latency average = 1342.222 ms
latency stddev = 3043.759 ms
tps = 47.383887 (including connections establishing)
tps = 47.385513 (excluding connections establishing)
-- waiting 0s... (always)
2017.03.09 10:27:56
-- getting md5 (cb)
6972 a,b,t,h:  250 25250  14336  ee0f7bfd9 960d7d79c 3e8af1e9e 
cd2bd0395 master
6973 a,b,t,h:  250 25250  14336  ee0f7bfd9 960d7d79c 3e8af1e9e 
cd2bd0395 replica ok  578113f12
2017.03.09 10:29:18
-- All is well.
-- 0 seconds total.  scale 25  clients 64  -T 300
-- waiting 20s, then end-cleaning
clean-at-end-call
sub_count -ne 0 : deleting sub1 (plain)
ERROR:  could not drop the replication slot "sub1" on publisher
DETAIL:  The error was: ERROR:  replication slot "sub1" is active for PID 10569
sub_count -ne 0 : deleting sub1 (nodrop)
pub_count -ne 0 - deleting pub1
pub_repl_slot_count -ne 0 - deleting (sub1)
ERROR:  replication slot "sub1" is active for PID 10569

 pub_count  0
   pub_repl_slot_count  1
 sub_count  0
   sub_repl_slot_count  0

-- imperfect cleanup, pg_waldump to unclean.20170309_1021.txt.bz2, waiting 60 
s, then exit
-- testset.sh: waiting 10s...

= iteration 2 --   2 of 10 =

-- scale 25   clients 64   duration 300   CLEAN_ONLY=

-- hostname: barzoi
-- timestamp: 20170309_1021
-- master_start_time 2017-03-08 12:04:02.127127+01 replica_start_time 
2017-03-08 12:04:02.12713+01
-- master  patch-md5 [59c92165d4a328d68450ef0e922c0a42]
-- replica patch-md5 [59c92165d4a328d68450ef0e922c0a42] (ok)
-- synchronous_commit, master  [on]  replica [off]
-- master_assert  [on]  replica_assert [on]
-- self md5 87554cfed7cda67ad292b6481e1b8b41 ./pgbench_derail2.sh
clean-at-start-call
pub_repl_slot_count -ne 0 - deleting (sub1)
 pg_drop_replication_slot 
--
 
(1 row)

creating tables...
1596800 of 250 tuples (63%) done (elapsed 5.09 s, remaining 2.88 s)
250 of 250 tuples (100%) done (elapsed 7.88 s, remaining 0.00 s)
vacuum...
set primary keys...
done.
create publication pub1 for all tables;
create subscription sub1 connection 'port=6972' publication pub1 with 
(disabled);
alter subscription sub1 enable;
-- pgbench -c 64 -j 8 -T 300 -P 60 -n   --  scale 25
progress: 60.0 s, 129.0 tps, lat 493.130 ms stddev 635.654
progress: 120.0 s, 34.0 tps, lat 1679.787 ms stddev 3808.253
progre

Re: [HACKERS] Explicit subtransactions for PL/Tcl

2017-03-09 Thread Victor Wagner

On Thu, 9 Mar 2017 11:12:09 +0100
Pavel Stehule  wrote:

> the regress tests is unstable
> 
> the proc name has mutable pid
> 
> ! (procedure "__PLTcl_proc_16503" line 3)
>   invoked from within
> ! "__PLTcl_proc_16503 SPI"
> 
> Regards

Really, I don't know what can be done with it, short of rewriting all
tests as tap-tests.

Definitely this patch is not the right place for reversing desing
decision of PL/Tcl authors to add a numeric suffix to the proc names.
(it is not PID. It is OID of the function).

Of course, I can catch all the errors inside Tcl code and return
just message, but it would sufficiently narrow test functionality.

Now test demonstrate how errors uncaught on the Tcl level interact with
postgresql error system.

 With best regards, Victor

--
 Victor Wagner 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Explicit subtransactions for PL/Tcl

2017-03-09 Thread Pavel Stehule

2017-03-09 11:45 GMT+01:00 Victor Wagner :

> On Thu, 9 Mar 2017 11:12:09 +0100
> Pavel Stehule  wrote:
>
>
> > the regress tests is unstable
> >
> > the proc name has mutable pid
> >
> > ! (procedure "__PLTcl_proc_16503" line 3)
> >   invoked from within
> > ! "__PLTcl_proc_16503 SPI"
> >
> > Regards
>
> Really, I don't know what can be done with it, short of rewriting all
> tests as tap-tests.
>
> Definitely this patch is not the right place for reversing desing
> decision of PL/Tcl authors to add a numeric suffix to the proc names.
> (it is not PID. It is OID of the function).
>
> Of course, I can catch all the errors inside Tcl code and return
> just message, but it would sufficiently narrow test functionality.
>
> Now test demonstrate how errors uncaught on the Tcl level interact with
> postgresql error system.
>

you can catch the exception outside and write own message

Pavel

>
>
>
>  With best regards, Victor
>
> --
>  Victor Wagner 
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins

2017-03-09 Thread David Rowley

On 9 March 2017 at 18:06, Ashutosh Bapat
 wrote:
>>>
>>> Here's the patch attached.
>>
>> Agreed. It seems like a good idea to keep that logic in a single location
>
> Ok.
>
>>
>> I've beaten your patch around a bit and come up with the attached.
>
> The new name merge_fdw_options() is shorter than the one I chose, but
> we are not exactly merging options for an upper relation since there
> isn't the other fpinfo to merge from. But I think we can live with
> merge_fdw_options().

Perhaps "combine" is a better word? I didn't really see a problem with
that. After I posted this I wished I'd made the function even more
generic to accept either side as NULL and make up the new fpinfo from
the non-NULL one, or Merge if both were non-NULL.  I liked that way
much better than giving the function too much knowledge of what its
purpose actually is. It's more likely to get used for something else
in the future, which means there's less chance that someone else will
make the same mistake.

> Looks like you forgot to compile the patch. It gave error
>
> postgres_fdw.c: In function ‘merge_fdw_options’:
> postgres_fdw.c:4337:2: error: ‘RelOptInfo’ has no member named ‘server’
> postgres_fdw.c:4337:2: error: ‘RelOptInfo’ has no member named ‘server’

Seems I forgot to test with asserts enabled. Thanks for finding/fixing that

> Once I fixed that, the testcases started showing an assertion failure,
> since fpinfo of a base relation can not have an outerrel. Fixed the
> assertion as well. If we are passing fpinfo's of joining relations,
> there's no need to have outerrel and innerrel in fpinfo of join.

Looks like you forgot to update the comment on the Assert()

> Also, you had removed comment
> /*
> + * Set the foreign server to which this join will be shipped if found 
> safe
> + * to push-down. We need server specific options like extensions to 
> decide
> + * push-down safety, so set FDW specific options.
> + */
> But I think it's important to have it there, so that reader knows why
> merge_fdw_options() needs to be called before assessing quals.

No objections.

> I have
> updated it a bit to clarify this further. Also, merge_fdw_options()
> shouldn't set fpinfo->server since it's not an FDW option per say.
> It's better to keep that outside of this function. With your patch
> fpinfo->server was being set twice for an upper relation.

Thanks for noticing. I intended that not to be there, but seems I
forgot to hit delete.

>>
>> The changes from yours are mostly cosmetic, but I've also added a
>> regression test too.
>
> Thanks a lot for the test.
>
> PFA updated patch.

Thanks for making the chances.

I'd say it's ready to go, pending updating the out of date comment:

+ /* We must always have fpinfo_o and it must always have an outerrel */
+ Assert(fpinfo_o);


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-09 Thread Thomas Munro

On Wed, Mar 8, 2017 at 12:58 PM, Andres Freund  wrote:
> 0002: Check hash join work_mem usage at the point of chunk allocation.
>
> Modify the existing hash join code to detect work_mem exhaustion at
> the point where chunks are allocated, instead of checking after every
> tuple insertion.  This matches the logic used for estimating, and more
> importantly allows for some parallelism in later patches.
>
> diff --git a/src/backend/executor/nodeHash.c b/src/backend/executor/nodeHash.c
> index 406c180..af1b66d 100644
> --- a/src/backend/executor/nodeHash.c
> +++ b/src/backend/executor/nodeHash.c
> @@ -48,7 +48,8 @@ static void ExecHashSkewTableInsert(HashJoinTable hashtable,
> int bucketNumber);
>  static void ExecHashRemoveNextSkewBucket(HashJoinTable hashtable);
>
> -static void *dense_alloc(HashJoinTable hashtable, Size size);
> +static void *dense_alloc(HashJoinTable hashtable, Size size,
> +bool respect_work_mem);
>
> I still dislike this, but maybe Robert's point of:
>
> On 2017-02-16 08:57:21 -0500, Robert Haas wrote:
>> On Wed, Feb 15, 2017 at 9:36 PM, Andres Freund  wrote:
>> > Isn't it kinda weird to do this from within dense_alloc()?  I mean that
>> > dumps a lot of data to disk, frees a bunch of memory and so on - not
>> > exactly what "dense_alloc" implies.  Isn't the free()ing part also
>> > dangerous, because the caller might actually use some of that memory,
>> > like e.g. in ExecHashRemoveNextSkewBucket() or such.  I haven't looked
>> > deeply enough to check whether that's an active bug, but it seems like
>> > inviting one if not.
>>
>> I haven't looked at this, but one idea might be to just rename
>> dense_alloc() to ExecHashBlahBlahSomething().  If there's a real
>> abstraction layer problem here then we should definitely fix it, but
>> maybe it's just the angle at which you hold your head.
>
> Is enough.

There is a problem here.  It can determine that it needs to increase
the number of batches, effectively splitting the current batch, but
then the caller goes on to insert the current tuple anyway, even
though it may no longer belong in this batch.  I will post a fix for
that soon.  I will also refactor it so that it doesn't do that work
inside dense_alloc.  You're right, that's too weird.

In the meantime, here is a new patch series addressing the other
things you raised.

> 0003: Scan for unmatched tuples in a hash join one chunk at a time.
>
>
> @@ -1152,8 +1155,65 @@ bool
>  ExecScanHashTableForUnmatched(HashJoinState *hjstate, ExprContext *econtext)
>  {
> HashJoinTable hashtable = hjstate->hj_HashTable;
> -   HashJoinTuple hashTuple = hjstate->hj_CurTuple;
> +   HashJoinTuple hashTuple;
> +   MinimalTuple tuple;
> +
> +   /*
> +* First, process the queue of chunks holding tuples that are in 
> regular
> +* (non-skew) buckets.
> +*/
> +   for (;;)
> +   {
> +   /* Do we need a new chunk to scan? */
> +   if (hashtable->current_chunk == NULL)
> +   {
> +   /* Have we run out of chunks to scan? */
> +   if (hashtable->unmatched_chunks == NULL)
> +   break;
> +
> +   /* Pop the next chunk from the front of the queue. */
> +   hashtable->current_chunk = 
> hashtable->unmatched_chunks;
> +   hashtable->unmatched_chunks = 
> hashtable->current_chunk->next;
> +   hashtable->current_chunk_index = 0;
> +   }
> +
> +   /* Have we reached the end of this chunk yet? */
> +   if (hashtable->current_chunk_index >= 
> hashtable->current_chunk->used)
> +   {
> +   /* Go around again to get the next chunk from the 
> queue. */
> +   hashtable->current_chunk = NULL;
> +   continue;
> +   }
> +
> +   /* Take the next tuple from this chunk. */
> +   hashTuple = (HashJoinTuple)
> +   (hashtable->current_chunk->data + 
> hashtable->current_chunk_index);
> +   tuple = HJTUPLE_MINTUPLE(hashTuple);
> +   hashtable->current_chunk_index +=
> +   MAXALIGN(HJTUPLE_OVERHEAD + tuple->t_len);
> +
> +   /* Is it unmatched? */
> +   if (!HeapTupleHeaderHasMatch(tuple))
> +   {
> +   TupleTableSlot *inntuple;
> +
> +   /* insert hashtable's tuple into exec slot */
> +   inntuple = ExecStoreMinimalTuple(tuple,
> + 
>hjstate->hj_HashTupleSlot,
> + 
>false); /* do not pfree */
> +   econtext->ecxt_inn

Re: [HACKERS] [PATCH] kNN for btree

2017-03-09 Thread Alexander Korotkov

On Thu, Mar 2, 2017 at 5:57 PM, David Steele  wrote:

> Hi Alexander,
>
> On 2/16/17 11:20 AM, Robert Haas wrote:
> > On Thu, Feb 16, 2017 at 10:59 AM, Tom Lane  wrote:
> >> Robert Haas  writes:
> >>> On Thu, Feb 16, 2017 at 8:05 AM, Alexander Korotkov
> >>>  wrote:
>  My idea is that we need more general redesign of specifying ordering
> which
>  index can produce.  Ideally, we should replace amcanorder,
> amcanbackward and
>  amcanorderbyop with single callback.  Such callback should take a
> list of
>  pathkeys and return number of leading pathkeys index could satisfy
> (with
>  corresponding information for index scan).  I'm not sure that other
> hackers
>  would agree with such design, but I'm very convinced that we need
> something
>  of this level of extendability.  Otherwise we would have to hack our
> planner
>  <-> index_access_method interface each time we decide to cover
> another index
>  produced ordering.
> >>
> >>> Yeah.  I'm not sure if that's exactly the right idea.  But it seems
> >>> like we need something.
> >>
> >> That's definitely not exactly the right idea, because using it would
> >> require the core planner to play twenty-questions trying to guess which
> >> pathkeys the index can satisfy.  ("Can you satisfy some prefix of this
> >> pathkey list?  How about that one?")  It could be sensible to have a
> >> callback that's called once per index and hands back a list of pathkey
> >> lists that represent interesting orders the index could produce, which
> >> could be informed by looking aside at the PlannerInfo contents to see
> >> what is likely to be relevant to the query.
> >>
> >> But even so, I'm not convinced that that is a better design or more
> >> maintainable than the current approach.  I fear that it will lead to
> >> duplicating substantial amounts of code and knowledge into each index
> AM,
> >> which is not an improvement; and if anything, that increases the risk of
> >> breaking every index AM anytime you want to introduce some fundamentally
> >> new capability in the area.  Now that it's actually practical to have
> >> out-of-core index AMs, that's a bigger concern than it might once have
> >> been.
> >
> > Yeah, that's all true.  But I think Alexander is right that just
> > adding amcandoblah flags ad infinitum doesn't feel good either.  The
> > interface isn't really arm's-length if every new thing somebody wants
> > to do something new requires another flag.
> >
> >> Also see the discussion that led up to commit ed0097e4f.  Users objected
> >> the last time we tried to make index capabilities opaque at the SQL
> level,
> >> so they're not going to like a design that tries to hide that
> information
> >> even from the core C code.
> >
> > Discoverability is definitely important, but first we have to figure
> > out how we're going to make it work, and then we can work out how to
> > let users see how it works.
>
> Reading through this thread I'm concerned that this appears to be a big
> change making its first appearance in the last CF.  There is also the
> need for a new patch and a general consensus of how to proceed.
>

Yes, refactoring of amcanorder/amcanorderbyop should be very thoughtful.

I recommend moving this patch to 2017-07 or marking it RWF.
>

I agree. Done.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] New CORRESPONDING clause design

2017-03-09 Thread Surafel Temesgen

Hi ,

Here is a patch corrected as your feedback except missed tests case because
corresponding by clause is implemented on the top of set operation and you
can’t do that to set operation without corresponding by clause too



Eg


postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

ERROR:  each UNION query must have the same number of columns

LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;

  ^

postgres=# create table t1(a int, b int, c int);

CREATE TABLE

postgres=# create table t2(a int, b int);

CREATE TABLE



postgres=# select * from t1 union select * from t2;

ERROR:  each UNION query must have the same number of columns

LINE 1: select * from t1 union select * from t2;



Thanks

Surafel

On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule 
wrote:

> Hi
>
> I am sending a review of this interesting feature.
>
> I found following issues, questions:
>
> 1. unclosed tags  in documentation
> 2. bad name "changeTargetEntry" - should be makeTargetEntry?
> 3. Why you removed lot of asserts in prepunion.c? These asserts should be
> valid still
> 4. make_coresponding_target has wrong formatting
> 5. error "%s queries with a CORRESPONDING clause must have at least one
> column with the same name" has wrong formatting, you can show position
> 6. previous issue is repeated - look on formatting ereport function,
> please, you can use DETAIL and HINT fields
> 7. corresponding clause should to contain column list (I am looking to
> ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it
> has impact on all implementation.
> 8. typo orderCorrespondingLsit(List *targetlist)
> 9. I miss more tests for CORRESPONDING BY
> 10. if I understand to this feature, this query should to work
>
> postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, 
> 6 c, 8 d;
> ERROR:  each UNION query must have the same number of columns
> LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...
>
> postgres=# create table t1(a int, b int, c int);
> CREATE TABLE
> Time: 63,260 ms
> postgres=# create table t2(a int, b int);
> CREATE TABLE
> Time: 57,120 ms
> postgres=# select * from t1 union corresponding select * from t2;
> ERROR:  each UNION query must have the same number of columns
> LINE 1: select * from t1 union corresponding select * from t2;
>
> If it is your first patch to Postgres, then it is perfect work!
>
> The @7 is probably most significant - I dislike a expression list there.
> name_list should be better there.
>
> Regards
>
> Pavel
>
>


corresponding_clause_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-03-09 Thread Ashutosh Bapat

>
> +if (rel->partial_pathlist != NIL &&
> +(Path *) linitial(rel->partial_pathlist) == subpath)
> +partial_subplans_set = bms_add_member(partial_subplans_set, i);
>
> This seems like a scary way to figure this out.  What if we wanted to
> build a parallel append subpath with some path other than the
> cheapest, for some reason?  I think you ought to record the decision
> that set_append_rel_pathlist makes about whether to use a partial path
> or a parallel-safe path, and then just copy it over here.
>

I agree that assuming that a subpath is non-partial path if it's not
cheapest of the partial paths is risky. In fact, we can not assume
that even when it's not one of the partial_paths since it could have
been kicked out or was never added to the partial path list like
reparameterized path. But if we have to save the information about
which of the subpaths are partial paths and which are not in
AppendPath, it would take some memory, noticeable for thousands of
partitions, which will leak if the path doesn't make into the
rel->pathlist. The purpose of that information is to make sure that we
allocate only one worker to that plan. I suggested that we use
path->parallel_workers for the same, but it seems that's not
guaranteed to be reliable. The reasons were discussed upthread. Is
there any way to infer whether we can allocate more than one workers
to a plan by looking at the corresponding path?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Performance issue after upgrading from 9.4 to 9.6

2017-03-09 Thread Naytro Naytro

Hi,

We are having some performance issues after we upgraded to newest
version of PostgreSQL, before it everything was fast and smooth.

Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
upgraded to 9.6.2 with no improvement.

Some information about our setup: Freebsd, Solaris (SmartOS), simple
master-slave using streaming replication.

Problem:
Very high system CPU when master is streaming replication data, CPU
goes up to 77%. Only one process is generating this load, it's a
postgresql startup process. When I attached a truss to this process I
saw a lot o read calls with almost the same number of errors (EAGAIN).

root(at)d8:~ # truss -c -p 38091
^Csyscall seconds   calls  errors
semop   0.001611782 198   0
write   0.74404   2   0
read2.281535100   17266   12375
openat  0.000683532  48   0
lseek   0.177612479   20443   0
close   0.000395549  48   0
  - --- ---
2.461912846   38005   12375

read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000)
lseek(778,0x0,SEEK_END) = 57344 (0xe000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000)
read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000)
lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000)
read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000)
read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000)
lseek(818,0x0,SEEK_END) = 57344 (0xe000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000)
lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000)
read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'

Descriptor 6 is a pipe

Read call try to read one byte over and over, I looked up to source
code and I think this file is responsible for this behavior
src/backend/storage/ipc/latch.c. There was no such file in 9.4.

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-03-09 Thread Ivan Kartyshov


Hello

On 07.03.2017 15:58, Masahiko Sawada wrote:

I've read the discussion so far but I didn't see the reason why you've
changed it to as a contrib module. Could you tell me about that?


I did it on the initiative of our customer, who preferred the 
implementation in the form of contrib. Contrib realization of WAITLSN 
adds to core the only hook.


On 07.03.2017 15:58, Masahiko Sawada wrote:
> I guess this feature would be more useful if provided as a core
> feature and we need to discuss about syntax as Thomas mentioned.

Here I attached rebased patch waitlsn_10dev_v3 (core feature)
I will leave the choice of implementation (core/contrib) to the 
discretion of the community.


Will be glad to hear your suggestion about syntax, code and patch.

--
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 2bc4d9f..6d5a81e 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -178,6 +178,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/waitlsn.sgml b/doc/src/sgml/ref/waitlsn.sgml
new file mode 100644
index 000..338187b
--- /dev/null
+++ b/doc/src/sgml/ref/waitlsn.sgml
@@ -0,0 +1,134 @@
+
+
+
+ 
+  WAITLSN
+ 
+
+ 
+  WAITLSN
+  7
+  SQL - Language Statements
+ 
+
+ 
+  WAITLSN
+  wait until target LSN has been replayed
+ 
+
+ 
+
+WAITLSN 'LSN' [ , delay ]
+WAITLSN_INFINITE 'LSN'
+WAITLSN_NO_WAIT 'LSN'
+
+ 
+
+ 
+  Description
+
+  
+   The WAITLSN wait till target LSN will
+   be replayed with an optional delay (milliseconds by default
+   infinity) to be wait for LSN to replayed.
+  
+
+  
+   WAITLSN provides a simple
+   interprocess LSN wait mechanism for a backends on slave
+   in master-slave replication scheme on PostgreSQL database.
+  
+
+  
+   The WAITLSN_INFINITE wait till target LSN will
+   be replayed on slave .
+  
+
+  
+   The WAITLSN_NO_WAIT will tell if target LSN was replayed without any wait.
+  
+ 
+
+ 
+  Parameters
+
+  
+   
+LSN
+
+ 
+  Target log sequence number to be wait for.
+ 
+
+   
+   
+delay
+
+ 
+  Time in miliseconds to waiting for LSN to be replayed.
+ 
+
+   
+  
+ 
+
+ 
+  Notes
+
+  
+   Delay time for waiting till LSN to be replayed must be integer. For
+   default it is infinity. Waiting can be interupped using Ctl+C, or
+   by Postmaster death.
+  
+ 
+
+ 
+  GUCs
+
+  
+ Add two GUCs which help tuning influence on StartupXLOG:
+ count_waitlsn (denominator to check not each LSN)
+ int	count_waitlsn= 10;
+  
+
+  
+ interval_waitlsn (Interval in milliseconds to additional LSN check)
+ int	interval_waitlsn = 100;
+  
+ 
+
+ 
+  Examples
+
+  
+   Configure and execute a waitlsn from
+   psql:
+
+
+WAITLSN '0/3F07A6B1', 1;
+NOTICE:  LSN is not reached. Try to make bigger delay.
+WAITLSN
+
+WAITLSN '0/3F07A611';
+WAITLSN
+
+WAITLSN '0/3F0FF791', 50;
+^CCancel request sent
+NOTICE:  LSN is not reached. Try to make bigger delay.
+ERROR:  canceling statement due to user request
+
+
+ 
+
+ 
+  Compatibility
+
+  
+   There is no WAITLSN statement in the SQL
+   standard.
+  
+ 
+
diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml
index c8191de..a5100a2 100644
--- a/doc/src/sgml/reference.sgml
+++ b/doc/src/sgml/reference.sgml
@@ -206,6 +206,7 @@
&update;
&vacuum;
&values;
+   &waitlsn;
 
  
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2dcff7f..9c2def1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -39,6 +39,7 @@
 #include "catalog/pg_control.h"
 #include "catalog/pg_database.h"
 #include "commands/tablespace.h"
+#include "commands/waitlsn.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -145,6 +146,9 @@ const struct config_enum_entry sync_method_options[] = {
 	{NULL, 0, false}
 };
 
+/* GUC variable */
+intcount_waitlsn = 10;
+intinterval_waitlsn = 100;
 
 /*
  * Although only "on", "off", and "always" are documented,
@@ -6948,6 +6952,8 @@ StartupXLOG(void)
 		{
 			ErrorContextCallback errcallback;
 			TimestampTz xtime;
+			TimestampTz			time_waitlsn = GetCurrentTimestamp();
+			int	counter_waitlsn = 0;
 
 			InRedo = true;
 
@@ -7174,6 +7180,17 @@ StartupXLOG(void)
 	break;
 }
 
+/*
+ * After update lastReplayedEndRecPtr set Latches in SHMEM array
+ */
+if (counter_waitlsn % count_waitlsn == 0
+	|| TimestampDifferenceExceeds(time_waitlsn,GetCurrentTimestamp(),interval_waitlsn))
+{
+	WaitLSNSetLatch();
+	time_waitlsn = GetCurrentTimestamp();
+}
+counter_waitlsn++;
+
 /* Else, try to fetch the next WAL record */
 record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
 			} while (record != NULL);
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index

Re: [HACKERS] Gather Merge

2017-03-09 Thread Robert Haas

On Wed, Mar 8, 2017 at 11:59 PM, Rushabh Lathia
 wrote:
> Here is another version of patch with the suggested changes.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
 wrote:
>>
>> +if (rel->partial_pathlist != NIL &&
>> +(Path *) linitial(rel->partial_pathlist) == subpath)
>> +partial_subplans_set = bms_add_member(partial_subplans_set, i);
>>
>> This seems like a scary way to figure this out.  What if we wanted to
>> build a parallel append subpath with some path other than the
>> cheapest, for some reason?  I think you ought to record the decision
>> that set_append_rel_pathlist makes about whether to use a partial path
>> or a parallel-safe path, and then just copy it over here.
>
> I agree that assuming that a subpath is non-partial path if it's not
> cheapest of the partial paths is risky. In fact, we can not assume
> that even when it's not one of the partial_paths since it could have
> been kicked out or was never added to the partial path list like
> reparameterized path. But if we have to save the information about
> which of the subpaths are partial paths and which are not in
> AppendPath, it would take some memory, noticeable for thousands of
> partitions, which will leak if the path doesn't make into the
> rel->pathlist.

True, but that's no different from the situation for any other Path
node that has substructure.  For example, an IndexPath has no fewer
than 5 list pointers in it.  Generally we assume that the number of
paths won't be large enough for the memory used to really matter, and
I think that will also be true here.  And an AppendPath has a list of
subpaths, and if I'm not mistaken, those list nodes consume more
memory than the tracking information we're thinking about here will.

I think you're thinking about this issue because you've been working
on partitionwise join where memory consumption is a big issue, but
there are a lot of cases where that isn't really a big deal.

> The purpose of that information is to make sure that we
> allocate only one worker to that plan. I suggested that we use
> path->parallel_workers for the same, but it seems that's not
> guaranteed to be reliable. The reasons were discussed upthread. Is
> there any way to infer whether we can allocate more than one workers
> to a plan by looking at the corresponding path?

I think it would be smarter to track it some other way.  Either keep
two lists of paths, one of which is the partial paths and the other of
which is the parallel-safe paths, or keep a bitmapset indicating which
paths fall into which category.  I am not going to say there's no way
we could make it work without either of those things -- looking at the
parallel_workers flag might be made to work, for example -- but the
design idea I had in mind when I put this stuff into place was that
you keep them separate in other ways, not by the data they store
inside them.  I think it will be more robust if we keep to that
principle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-09 Thread Michael Paquier

On Thu, Mar 9, 2017 at 1:17 AM, Joe Conway  wrote:
> On 03/07/2017 08:29 PM, Tom Lane wrote:
>> Michael Paquier  writes:
>>> here is a separate thread dedicated to the following extension for
>>> CREATE/ALTER ROLE: PASSWORD ('value' USING 'method').
>>
>> The parentheses seem weird ... do we really need those?
>
> +1

Seeing 3 opinions in favor of that, let's do so then. I have updated
the patch to not use parenthesis.

>> +If you do not plan to use password authentication you can omit this
>> +option. The methods supported are md5 to enforce a 
>> password
>> +to be MD5-encrypted, scram for a SCRAM-encrypted 
>> password
>> +and plain for an unencrypted password.  If the password
>
> Can we please stop calling this encryption? What is being done is a form
> of cryptographic hashing, not encryption.

Yes, I agree with that for MD5, and after looking around I can see
(like here http://prosody.im/doc/plain_or_hashed) as well that
SCRAM-hashed is used. Now, there are as well references to the salt,
like in protocol.sgml:
"The salt to use when encrypting the password."
Joe, do you think that in this case using the term "hashing" would be
more appropriate? I would think so as we use it to hash the password.

The patch attached removes the parenthesis for this grammar, and uses
"hashed" instead of "encrypted" for the new documentation. For the
existing documentation, perhaps we had better just spawn a new thread,
but I am unsure of all the details yet. Opinions welcome.
-- 
Michael


0001-Add-clause-PASSWORD-val-USING-protocol-to-CREATE-ALT.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Performance issue after upgrading from 9.4 to 9.6

2017-03-09 Thread Naytro Naytro

Hi,

We are having some performance issues after we upgraded to newest
version of PostgreSQL, before it everything was fast and smooth.

Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
upgraded to 9.6.2 with no improvement.

Some information about our setup: Freebsd, Solaris (SmartOS), simple
master-slave using streaming replication.

Problem:
Very high system CPU when master is streaming replication data, CPU
goes up to 77%. Only one process is generating this load, it's a
postgresql startup process. When I attached a truss to this process I
saw a lot o read calls with almost the same number of errors (EAGAIN).

root(at)d8:~ # truss -c -p 38091
^Csyscall seconds   calls  errors
semop   0.001611782 198   0
write   0.74404   2   0
read2.281535100   17266   12375
openat  0.000683532  48   0
lseek   0.177612479   20443   0
close   0.000395549  48   0
  - --- ---
2.461912846   38005   12375

read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000)
lseek(778,0x0,SEEK_END) = 57344 (0xe000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000)
read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000)
lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000)
read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000)
read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000)
lseek(818,0x0,SEEK_END) = 57344 (0xe000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000)
lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000)
read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'

Descriptor 6 is a pipe

Read call try to read one byte over and over, I looked up to source
code and I think this file is responsible for this behavior
src/backend/storage/ipc/latch.c. There was no such file in 9.4.

Re: [HACKERS] Gather Merge

2017-03-09 Thread Rushabh Lathia

On Thu, Mar 9, 2017 at 6:19 PM, Robert Haas  wrote:

> On Wed, Mar 8, 2017 at 11:59 PM, Rushabh Lathia
>  wrote:
> > Here is another version of patch with the suggested changes.
>
> Committed.
>
>
Thanks Robert for committing this.

My colleague Neha Sharma found one regression with the patch. I was about
to send this mail and noticed that you committed the patch.

Here is the small example:

Test setup:

1) ./db/bin/pgbench postgres -i -F 100 -s 20

2) update pgbench_accounts set filler = 'foo' where aid%10 = 0;

3) vacuum analyze pgbench_accounts

4)

postgres=# set max_parallel_workers_per_gather = 4;
SET

postgres=# explain select aid from pgbench_accounts where aid % 25= 0 group
by aid;
ERROR:  ORDER/GROUP BY expression not found in targetlist

postgres=# set enable_indexonlyscan = off;
SET
postgres=# explain select aid from pgbench_accounts where aid % 25= 0 group
by aid;
   QUERY
PLAN

 Group  (cost=44708.21..45936.81 rows=10001 width=4)
   Group Key: aid
   ->  Gather Merge  (cost=44708.21..45911.81 rows=1 width=0)
 Workers Planned: 4
 ->  Group  (cost=43708.15..43720.65 rows=2500 width=4)
   Group Key: aid
   ->  Sort  (cost=43708.15..43714.40 rows=2500 width=4)
 Sort Key: aid
 ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..43567.06 rows=2500 width=4)
   Filter: ((aid % 25) = 0)
(10 rows)

- Index only scan with GM do work - but with ORDER BY clause
postgres=# set enable_seqscan = off;
SET
postgres=# explain select aid from pgbench_accounts where aid % 25= 0 order
by aid;
 QUERY
PLAN
-
 Gather Merge  (cost=1000.49..113924.61 rows=10001 width=4)
   Workers Planned: 4
   ->  Parallel Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.43..111733.33 rows=2500 width=4)
 Filter: ((aid % 25) = 0)
(4 rows)

Debugging further I found that problem only when IndexOnlyScan under GM and
that to only with grouping. Debugging problem I found that ressortgroupref
is
not getting set. That lead me thought that it might be because
create_gather_merge_plan()
is building tlist, with build_path_tlist. Another problem I found is that
create_grouping_paths() is passing NULL for the targets while calling
create_gather_merge_path(). (fix_target_gm.patch)

With those changes above test is running fine, but it broke other things.
Like

postgres=# explain select distinct(bid) from pgbench_accounts where filler
like '%foo%' group by aid;
ERROR:  GatherMerge child's targetlist doesn't match GatherMerge

Another thing I noticed that, if we stop adding gathermerge during the
generate_gather_paths() then all the test working fine.
(comment_gm_from_generate_gather.patch)

I will continue looking at this problem.

Attaching both the patch for reference.

regards,
Rushabh Lathia
www.EnterpriseDB.com

comment_gm_from_generate_gather.patch
Description: application/download

fix_target_gm.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib modules and relkind check

2017-03-09 Thread Stephen Frost

Amit, Michael,

* Amit Langote (langote_amit...@lab.ntt.co.jp) wrote:
> On 2017/03/09 11:51, Michael Paquier wrote:
> > On Wed, Mar 8, 2017 at 5:10 PM, Amit Langote
> >  wrote:
> >> On 2017/03/08 16:47, Michael Paquier wrote:
> >>> Only regular tables are tested as valid objects. Testing toast tables
> >>> is not worth the complication. Could you add as well a matview?
> >>
> >> Done in the attached updated patch.
> > 
> > +select count(*) > 0 from pg_visibility('matview');
> > + ?column?
> > +--
> > + f
> > +(1 row)
> > That's quite a generic name :) You may want to use less common names
> > in your tests.
> 
> I agree.  Updated patch attached.
> 
> > OK, I am marking that as ready for committer.

I'm starting to look over this, seems pretty straight-forward.

Barring issues, I should get it committed in probably an hour or two.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Should buffer of initialization fork have a BM_PERMANENT flag

2017-03-09 Thread Artur Zakirov


Hello,

I wanted to review the patch. But the patch is applied with errors. I've 
rebased the local copy and have done review on it. I'm not sure is it 
properly to send rebased patch by reviewer, so I haven't sent it to 
avoid confuses.


On 29.01.2017 17:00, Michael Paquier wrote:

Attached is what I have in mind for HEAD. btree, gist, spgist and
bloom indexes are changed so as the init forks created go through the
shared buffers instead of having their empty() routines handle the
flush of the page created. This removes any kind of race conditions
between the checkpointer and the init fork creations, which is I think
a good thing.


I think this is good fixes. I've checked them. And in my opinion they 
are correct.


The code also is good.



Here are the tests I have done.
First running those commands to create all types of indexes.
create extension bloom;
create extension btree_gist;
create extension btree_gin;
create unlogged table foo (a int);
create index foo_bt on foo(a);
create index foo_bloom on foo using bloom(a);
create index foo_gin on foo using gin (a);
create index foo_gist on foo using gist (a);
create index foo_brin on foo using brin (a);
create unlogged table foo_geo (a box);
create index foo_spgist ON foo_geo using spgist(a);
checkpoint;

Then crash the server, restart it, and the following vacuums are able
to complete.
vacuum foo;
vacuum foo_geo;



I've done this tests. Before the patch server crashes on vacuum command. 
After applying the patch server doesn't crash on vacuum command.


I have run regression and TAP tests. They all passed without error.

I think the patch can be marked as "Ready for Committer" after rebase.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling replication connections by default in pg_hba.conf

2017-03-09 Thread Peter Eisentraut

On 3/8/17 02:34, Michael Paquier wrote:
> This patch looks good to me.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] background sessions

2017-03-09 Thread Peter Eisentraut

On 3/8/17 14:22, Pavel Stehule wrote:
> 1. will be background session process closed automatically when parent
> process is closed?

If the communications queue goes away the process will eventually die.
This is similar to how a backend process will eventually die if the
client goes away.  Some more testing would be good here.

> 2. what timeouts are valid for this process - statement timeout, idle in
> transaction timeout 

Those should work the same way.  It's the same code that runs the
queries, starts/stops transactions, etc.

> I see significant risk on leaking sessions.

Yeah, that's a valid concern.  But I think it works ok.

> There can be more doc and examples in plpython doc. It will be main
> interface for this feature. Mainly about session processing.

OK, I'll look into that again.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] kNN for SP-GiST

2017-03-09 Thread Alexander Korotkov

Hi, Nikita!

I take a look to this patchset.  My first notes are following.

* 0003-Extract-index_store_orderby_distances-v02.patch

Function index_store_orderby_distances doesn't look general enough for its
name.  GiST supports ordering only by float4 and float8 distances.  SP-GiST
also goes that way.  But in the future, access methods could support
distances of other types.
Thus, I suggest to rename function to
 index_store_float8_orderby_distances().

* 0004-Add-kNN-support-to-SP-GiST-v02.patch

Patch didn't applied cleanly.

patching file doc/src/sgml/spgist.sgml
patching file src/backend/access/spgist/Makefile
patching file src/backend/access/spgist/README
patching file src/backend/access/spgist/spgscan.c
Hunk #5 succeeded at 242 with fuzz 2.
Hunk #11 FAILED at 861.
1 out of 11 hunks FAILED -- saving rejects to file
src/backend/access/spgist/spgscan.c.rej
patching file src/backend/access/spgist/spgtextproc.c
patching file src/backend/access/spgist/spgutils.c
Hunk #3 succeeded at 65 (offset 1 line).
Hunk #4 succeeded at 916 (offset 1 line).
patching file src/backend/access/spgist/spgvalidate.c
patching file src/include/access/spgist.h
patching file src/include/access/spgist_private.h
Hunk #4 FAILED at 191.
Hunk #5 succeeded at 441 (offset -230 lines).
1 out of 5 hunks FAILED -- saving rejects to file
src/include/access/spgist_private.h.rej

I had to apply failed chunks manually.  While trying to compile that I
faced compile error.

spgtextproc.c:89:7: error: no member named 'suppLen' in 'struct
spgConfigOut'
cfg->suppLen = 0; /* we don't need any supplimentary data */
~~~  ^

I noticed that this line was removed in the next patch.  After removing it,
I faced following error.

make[4]: *** No rule to make target `spgproc.o', needed by `objfiles.txt'.
Stop.

After removing spgproc.o from src/backend/access/spgist/Makefile, I finally
succeed to compile it.


*** AUTHORS
> *** 371,373 
> --- 376,379 
>
>   Teodor Sigaev 
>   Oleg Bartunov 
> + Vlad Sterzhanov 


I don't think it worth mentioning here GSoC student who made just dirty
prototype and abandon this work just after final evaluation.

More generally, you switched SP-GiST from scan stack to pairing heap.  We
should check if it introduces overhead to non-KNN scans.  Also some
benchmarks comparing KNN SP-GIST vs KNN GiST would be now.

*
0005-Add-ordering-operators-to-SP-GiST-kd_point_ops-and-quad_point_ops-v02.patch

I faced following warning.

spgproc.c:32:10: warning: implicit declaration of function 'get_float8_nan'
is invalid in C99 [-Wimplicit-function-declaration]
return get_float8_nan();
   ^
1 warning generated.

* 0006-Add-spg_compress-method-to-SP-GiST-v02.patch
* 0007-Add-SP-GiST-opclasses-poly_ops-and-circle_ops-v02.patch

I think this is out of scope for KNN SP-GiST.  This is very valuable, but
completely separate feature.  And it should be posted and discussed
separately.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Automatic cleanup of oldest WAL segments with pg_receivexlog

2017-03-09 Thread Peter Eisentraut

On 3/7/17 11:16, Robert Haas wrote:
> Well, if the problem you're trying to solve is "retain WAL for as long
> as possible without running out of disk space and having everything go
> kablooey", then it would solve that problem, and I think that's a very
> reasonable problem to want to solve.

Could be.  I'm not sure what that means for the presented patch, though.
 Or whether it addresses Michael's original use case at all.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]: fix bug in SP-GiST box_ops

2017-03-09 Thread Alexander Korotkov

On Wed, Feb 1, 2017 at 2:43 PM, Emre Hasegeli  wrote:

> > I think this patch is already in a good shape.
>
> I am sorry for introducing this bug.  This fix looks good to me as well.

I checked this patch too.  And it seems good to me as well.
Should we mark it as "ready for committer"?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] postgres_fdw IMPORT SCHEMA and partitioned tables

2017-03-09 Thread Stephen Frost

Greetings,

While reviewing Amit Langote's patch to handle partitioned tables
properly in various contrib modules (mostly by throwing an error since
things like pageinspect aren't going to work on the empty 'parent'
table), I went looking through contrib for other modules that do
something with relkind and noticed that postgres_fdw's IMPORT SCHEMA
would pull in the child tables (relkind = 'r') but would ignore the
parent table (relkind = 'P', or soon to be 'p', I guess).

I tend to view this as an issue which should be added to the open items
list and resolved before PG10 (though perhaps it could be done after
feature freeze), but I could see an argument that it should be just a
documented limitation of postgres_fdw and that adding such support would
be a new feature.

In any case, this seems like an issue that should be addressed one way
or the other, so I'll add it to the open items list.  I'm not planning
to work on fixing it myself, but if someone proposes a patch which looks
reasonable, I'll try to find time for it.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-09 Thread Robert Haas

On Fri, Feb 17, 2017 at 1:47 AM, Amit Langote
 wrote:
> On 2017/02/17 14:50, Peter Geoghegan wrote:
>> On Thu, Feb 16, 2017 at 9:27 PM, Amit Langote
>>  wrote:
>>> Attached patch fixes that.  Thom, your example query should not error out
>>> with the patch.  As discussed here, DO UPDATE cannot be supported at the
>>> moment.
>>
>> Maybe you should just let infer_arbiter_indexes() fail, rather than
>> enforcing this directly. IIRC, that's what happens with
>> inheritance-based partitioning.
>
> That would be another way.  The error message emitted by
> infer_arbiter_indexes() would be:
>
> ERROR:  there is no unique or exclusion constraint matching the ON
> CONFLICT specification
>
> It does read better than what proposed patch makes
> transformOnConflictClause() emit:
>
> ERROR:  ON CONFLICT ON UPDATE clause is not supported with partitioned tables
>
> I updated the patch.  Now it's reduced to simply removing the check in
> transformInsertStmt() that prevented using *any* ON CONFLICT on
> partitioned tables at all.
>
>
> I don't however see why the error would *necessarily* occur in the case of
> inheritance partitioning.  I mean if inserts into the root table in an
> inheritance hierarchy, it's still possible to ON CONFLICT DO UPDATE using
> the unique index only on that table for inference, although that's what a
> user would intend to do.
>
> create table foo (a int, b int, unique (a));
> create table foo_part (like foo including indexes) inherits (foo);
> insert into foo values (1, 2);
>
> -- the following still works
>
> insert into foo values (1, 2)
>on conflict (a) do update set b = excluded.b where excluded.a = 1;
> insert into foo values (1, 2)
>on conflict (a) do update set b = excluded.b where excluded.a = 1;
>
> As the documentation about inheritance partitioning notes, that may not be
> the behavior expected for partitioned tables:
>
> 
>  INSERT statements with ON CONFLICT
>  clauses are unlikely to work as expected, as the ON CONFLICT
>  action is only taken in case of unique violations on the specified
>  target relation, not its child relations.
> 
>
> With partitioned tables, since it's not possible to create index
> constraints on them, ON CONFLICT DO UPDATE simply won't work.  So the
> patch also updates the note in the document about partitioned tables and
> ON CONFLICT.

This patch no longer applies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-03-09 Thread Robert Haas

On Mon, Feb 27, 2017 at 11:15 PM, Jim Nasby  wrote:
>> * You're considering any WAL file with a power of 2 as valid. Suppose,
>> the correct WAL seg size is 64mb. For some reason, the server
>> generated a 16mb invalid WAL file(maybe it crashed while creating the
>> WAL file). Your code seems to treat this as a valid file which I think
>> is incorrect. Do you agree with that?
>
> Detecting correct WAL size based on the size of a random WAL file seems like
> a really bad idea to me.

It's not the most elegant thing ever, but I'm not sure I really see a
big problem with it.  Today, if the WAL file were the wrong size, we'd
just error out.  With the patch, if the WAL file were the wrong size,
but happened to be a size that we consider legal, pg_waldump would
treat it as a legal file and try to display the WAL records contained
therein.  This doesn't seem like a huge problem from her; what are you
worried about?

I agree that it would be bad if, for example, pg_resetwal saw a broken
WAL file in pg_wal and consequently did the reset incorrectly, because
the whole point of pg_resetwal is to escape situations where the
contents of pg_wal may be bogus.  However, pg_resetwal can read the
value from the control file, so the technique of believing the file
size doesn't need to be used in that case anyway.  The only tools that
need to infer the WAL size from the sizes of the segments actually
present are those that neither have a running cluster (where SHOW can
be used) nor access to the control file.  There aren't many of those,
and pg_waldump, at least, is a debugging tool anyway.  IIUC, the other
case where this comes up is for pg_standby, but if the WAL segments
aren't all the same size that tool is presumably going to croak with
or without these changes, so I'm not really sure there's much of an
issue here.

I might be missing something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 7:59 AM, Michael Paquier
 wrote:
> Yes, I agree with that for MD5, and after looking around I can see
> (like here http://prosody.im/doc/plain_or_hashed) as well that
> SCRAM-hashed is used. Now, there are as well references to the salt,
> like in protocol.sgml:
> "The salt to use when encrypting the password."
> Joe, do you think that in this case using the term "hashing" would be
> more appropriate? I would think so as we use it to hash the password.

I'm not Joe, but I think that would be more appropriate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GSoC] Self introduction and question to mentors

2017-03-09 Thread Andrew Borodin

Hi, Kirill!

2017-03-06 12:41 GMT+05:00 Кирилл Бороздин :
> My name is Kirill Borozdin. I am a student of Ural Federal University from
> Yekaterinburg, Russia.
That's fine. I'm the associate professor at Ural Federal University.
But not in your institution, though.

> I understand that this task will require a big number of experiments because
> we want to achieve
> a speed-up in real life cases.
>
> Can you recommend me some articles or specific source files in PostgreSQL
> codebase (it is so huge!)
> which I can explore to be better prepared (in case I will be lucky enough to
> be selected for GSoC)?

Well, this project is not that specific to PostgreSQL. Everything you
need to compile and use Postgres you can find on Wiki or just ask me
directly.

For starters, I'd recommend to look up some sorting algorithms survey
papers from Google Scholar. There is a good paper by Goetz Graefe on
sorting, but it either reflects current situation that some future
algorithms :)

Something on cache-oblivious sorting and on cache
http://igoro.com/archive/gallery-of-processor-cache-effects/

Finally, papers referenced in Wiki page about GSoC for this project
are a good starting point.

As for tests, TPC is kind of ubiquitous, We could just move standard
pg_bench script towards more sorting involved.

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] partitioned tables and contrib/sepgsql

2017-03-09 Thread Stephen Frost

Greetings,

While going over the contrib modules, I noticed that sepgsql was not
updated for partitioned tables.  What that appears to mean is that it's
not possible to define labels on partitioned tables.  As I recall,
accessing the parent of a table will, similar to the GRANT system, not
perform checkes against the child tables, meaning that there's no way to
have SELinux checks properly enforced when partitioned tables are being
used.

This is an issue which should be resolved for PG10, so I'll add it to
the open items list.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-09 Thread Mike Palmiotto

On Thu, Mar 9, 2017 at 9:47 AM, Stephen Frost  wrote:
> Greetings,
>
> While going over the contrib modules, I noticed that sepgsql was not
> updated for partitioned tables.  What that appears to mean is that it's
> not possible to define labels on partitioned tables.  As I recall,
> accessing the parent of a table will, similar to the GRANT system, not
> perform checkes against the child tables, meaning that there's no way to
> have SELinux checks properly enforced when partitioned tables are being
> used.

I'll start taking a look at this. Presumably we'd just extend existing
object_access_hooks to cover partitioned tables?

>
> This is an issue which should be resolved for PG10, so I'll add it to
> the open items list.

I'll grab it. Thanks.

--Mike


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')

2017-03-09 Thread Joe Conway

On 03/09/2017 06:34 AM, Robert Haas wrote:
> On Thu, Mar 9, 2017 at 7:59 AM, Michael Paquier
>  wrote:
>> Yes, I agree with that for MD5, and after looking around I can see
>> (like here http://prosody.im/doc/plain_or_hashed) as well that
>> SCRAM-hashed is used. Now, there are as well references to the salt,
>> like in protocol.sgml:
>> "The salt to use when encrypting the password."
>> Joe, do you think that in this case using the term "hashing" would be
>> more appropriate? I would think so as we use it to hash the password.
> 
> I'm not Joe, but I think that would be more appropriate.


I am Joe and I agree ;-)


-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-09 Thread Stephen Frost

Mike,

* Mike Palmiotto (mike.palmio...@crunchydata.com) wrote:
> On Thu, Mar 9, 2017 at 9:47 AM, Stephen Frost  wrote:
> > While going over the contrib modules, I noticed that sepgsql was not
> > updated for partitioned tables.  What that appears to mean is that it's
> > not possible to define labels on partitioned tables.  As I recall,
> > accessing the parent of a table will, similar to the GRANT system, not
> > perform checkes against the child tables, meaning that there's no way to
> > have SELinux checks properly enforced when partitioned tables are being
> > used.
> 
> I'll start taking a look at this. Presumably we'd just extend existing
> object_access_hooks to cover partitioned tables?

At least on first blush that seems like the right approach.  We'll need
to make sure that the SECURITY LABEL system will properly work with
partitioned tables too, of course, and that the checks are called when a
user queries a partitioned table.  Then we'll need regression tests to
make sure we get it all correct and don't screw it up in the future. ;)

> > This is an issue which should be resolved for PG10, so I'll add it to
> > the open items list.
> 
> I'll grab it. Thanks.

Excellent, thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-09 Thread Amit Kapila

On Thu, Mar 9, 2017 at 3:11 AM, Robert Haas  wrote:
> On Tue, Mar 7, 2017 at 6:41 PM, Robert Haas  wrote:
 Great, thanks.  0001 looks good to me now, so committed.
>>>
>>> Committed 0002.
>>
>> Here are some initial review thoughts on 0003 based on a first read-through.
>
> More thoughts on the main patch:
>
> +mode anyway).  It would seem natural to complete the split in VACUUM, but 
> since
> +splitting a bucket might require allocating a new page, it might fail if you
> +run out of disk space.  That would be bad during VACUUM - the reason for
> +running VACUUM in the first place might be that you run out of disk space,
> +and now VACUUM won't finish because you're out of disk space.  In contrast,
> +an insertion can require enlarging the physical file anyway.
>
> But VACUUM actually does try to complete the splits, so this is wrong, I 
> think.
>

Vacuum just tries to clean the remaining tuples from old bucket.  Only
insert or split operation tries to complete the split.

>
> +if (!xlrec.is_primary_bucket_page)
> +XLogRegisterBuffer(0, bucket_buf, REGBUF_STANDARD);
>
> So, in hashbucketcleanup, the page is registered and therefore an FPI
> will be taken if this is the first reference to this page since the
> checkpoint, but hash_xlog_delete won't restore the FPI.  I think that
> exposes us to a torn-page hazard.
> _hash_freeovflpage/hash_xlog_squeeze_page seems to have the same
> disease, as does _hash_squeezebucket/hash_xlog_move_page_contents.
>

Right, if we use XLogReadBufferForRedoExtended() instead of
XLogReadBufferExtended()/LockBufferForCleanup during relay routine,
then it will restore FPI when required and can serve our purpose of
acquiring clean up lock on primary bucket page.  Yet another way could
be to store some information like block number, relfilenode, forknum
(maybe we can get away with some less info) of primary bucket in the
fixed part of each of these records and then using that read the page
and then take a cleanup lock.   Now, as in most cases, this buffer
needs to be registered only in cases when there are multiple overflow
pages, I think having fixed information might hurt in some cases.

>
> +/*
> + * As bitmap page doesn't have standard page layout, so this will
> + * allow us to log the data.
> + */
> +XLogRegisterBuffer(2, mapbuf, REGBUF_STANDARD);
> +XLogRegisterBufData(2, (char *) &bitmap_page_bit, 
> sizeof(uint32));
>
> If the page doesn't have a standard page layout, why are we passing
> REGBUF_STANDARD?  But I think you've hacked things so that bitmap
> pages DO have a standard page layout, per the change to
> _hash_initbitmapbuffer, in which case the comment seems wrong.
>

Yes, the comment is obsolete and I have removed it.  We need standard
page layout for bitmap page, so that full page writes won't exclude
the bitmappage data.

>
> +/*
> + * Note that we still update the page even if it was restored from a full
> + * page image, because the bucket flag is not included in the image.
> + */
>
> Really?  Why not?  (Because it's part of the special space?)
>

Yes, because it's part of special space.  Do you want that to
mentioned in comments?

> Doesn't hash_xlog_split_allocpage need to acquire cleanup locks to
> guard against concurrent scans?
>

I don't think so, because regular code takes it on old bucket to
protect the split from concurrent inserts and cleanup lock on new
bucket is just for the sake of consistency.

> And also hash_xlog_split_complete?

In regular code, we are not taking cleanup lock for this operation.

> And hash_xlog_delete?  If the regular code path is getting a cleanup
> lock to protect against concurrent scans, recovery better do it, too.
>

We are already taking cleanup lock for this, not sure what exactly is
missed here, can you be more specific?

> +/*
> + * There is no harm in releasing the lock on old bucket before new bucket
> + * during replay as no other bucket splits can be happening concurrently.
> + */
> +if (BufferIsValid(oldbuf))
> +UnlockReleaseBuffer(oldbuf);
>
> And I'm not sure this is safe either.  The issue isn't that there
> might be another bucket split happening concurrently, but that there
> might be scans that get confused by seeing the bucket split flag set
> before the new bucket is created or the metapage updated.
>

During scan we check for bucket being populated, so this should be safe.

>  Seems like
> it would be safer to keep all the locks for this one.
>

If you feel better that way, I can change it and add a comment saying
something like "we could release lock on bucket being split early as
well, but doing here to be consistent with normal operation"

>
> It seems like a good test to do with this patch would be to set up a
> pgbench test on the master with a hash index replacing the usual btree
> index.  Then, set up a standby and run a read-only pgbench

Re: [HACKERS] [bug fix] dblink leaks unnamed connections

2017-03-09 Thread Fujii Masao

On Wed, Mar 8, 2017 at 3:52 PM, Tsunakawa, Takayuki
 wrote:
> Hello,
>
> dblink fails to close the unnamed connection as follows when a new unnamed 
> connection is requested.  The attached patch fixes this.

This issue was reported about ten years ago and added as TODO item.
http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php

I agree that this is a bug, and am tempted to back-patch to all the supported
versions. But it had not been fixed in many years since the first report of
the issue. So I'm not sure if it's ok to just treat this as a bug right now and
back-patch. Or we should fix this only in HEAD? Anyway I'd like to hear
more opinions about this.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-09 Thread Amit Kapila

On Thu, Mar 9, 2017 at 12:25 AM, Robert Haas  wrote:
> On Wed, Mar 8, 2017 at 7:45 AM, Amit Kapila  wrote:
>> Okay, I can try, but note that currently there is no test related to
>> "snapshot too old" for any other indexes.
>
> Wow, that's surprising.  It seems the snapshot_too_old test only
> checks that this works for a table that has no indexes.  Have you,
> anyway, tested it manually?
>

Yes, I have tested in manually.  I think we need to ensure that the
modified tuple falls on the same page as old tuple to make the test
work.  The slight difficulty with the index is to ensure the modified
tuple to be inserted into same page as old tuple, this is more true
with hash indexes.  Also, for heap, I think it relies on hot pruning
stuff and for index we need to perform manual vacuum.  Basically, if
we want we can write a test for index, but not sure if it is worth the
pain to write for hash index when the test for btree is not there.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread David Rowley

I was just doing some testing on [1] when I noticed that there's a problem
with parallel bitmap index scans scans.

Test case:

patch with [1]

=# create table r1(value int);
CREATE TABLE
=# insert into r1 select (random()*1000)::int from
generate_Series(1,100);
INSERT 0 100
=# create index on r1 using brin(value);
CREATE INDEX
=# set enable_seqscan=0;
SET
=# explain select * from r1 where value=555;
   QUERY PLAN

-
 Gather  (cost=3623.52..11267.45 rows=5000 width=4)
   Workers Planned: 2
   ->  Parallel Bitmap Heap Scan on r1  (cost=2623.52..9767.45 rows=2083
width=4)
 Recheck Cond: (value = 555)
 ->  Bitmap Index Scan on r1_value_idx  (cost=0.00..2622.27
rows=522036 width=0)
   Index Cond: (value = 555)
(6 rows)

=# explain analyze select * from r1 where value=555;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The crash occurs in tbm_shared_iterate() at:

PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];


I see in tbm_prepare_shared_iterate() tbm->npages is zero. I'm unsure if
bringetbitmap() does something different with npages than btgetbitmap()
around setting npages?

But anyway, due to the npages being 0 the tbm->ptpages is not allocated
in tbm_prepare_shared_iterate()

if (tbm->npages)
{
tbm->ptpages = dsa_allocate(tbm->dsa, sizeof(PTIterationArray) +
tbm->npages * sizeof(int));

so when tbm_shared_iterate runs this code;

/*
* If both chunk and per-page data remain, must output the numerically
* earlier page.
*/
if (istate->schunkptr < istate->nchunks)
{
PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
BlockNumber chunk_blockno;

chunk_blockno = chunk->blockno + istate->schunkbit;

if (istate->spageptr >= istate->npages ||
chunk_blockno < page->blockno)
{
/* Return a lossy page indicator from the chunk */
output->blockno = chunk_blockno;
output->ntuples = -1;
output->recheck = true;
istate->schunkbit++;

LWLockRelease(&istate->lock);
return output;
}
}

it fails, due to idxpages pointing to random memory

Probably this is a simple fix for the authors, so passing it along. I'm a
bit unable to see how the part above is meant to work.

[1]
https://www.postgresql.org/message-id/attachment/50164/brin-correlation-v3.patch

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] [bug fix] dblink leaks unnamed connections

2017-03-09 Thread Tom Lane

Fujii Masao  writes:
> On Wed, Mar 8, 2017 at 3:52 PM, Tsunakawa, Takayuki
>  wrote:
>> dblink fails to close the unnamed connection as follows when a new unnamed 
>> connection is requested.  The attached patch fixes this.

> This issue was reported about ten years ago and added as TODO item.
> http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php

> I agree that this is a bug, and am tempted to back-patch to all the supported
> versions. But it had not been fixed in many years since the first report of
> the issue. So I'm not sure if it's ok to just treat this as a bug right now 
> and
> back-patch. Or we should fix this only in HEAD? Anyway I'd like to hear
> more opinions about this.

It looks to me like the issue simply fell through the cracks because Joe
wasn't excited about fixing it.  Now that we have a second complaint,
I think it's worth treating as a bug and back-patching.

(I've not read this particular patch and am not expressing an opinion
whether it's correct.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use SQL standard error code for nextval

2017-03-09 Thread Peter Eisentraut

On 2/28/17 22:15, Peter Eisentraut wrote:
> The SQL standard defines a separate error code for nextval exhausting
> the sequence space.  I haven't found any discussion of this in the
> archives, so it seems this was just not considered or not yet in
> existence when the error codes were introduced.  Here is a patch to
> correct it.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Stephen Frost

Greetings,

Consider this:

create table t10 (c1 int, c2 int);
create index on t10 (c1) where c2 > 5;

\d t10
  Table "sfrost.t10"
 Column |  Type   | Modifiers 
+-+---
 c1 | integer | 
 c2 | integer | 
Indexes:
"t10_c1_idx" btree (c1) WHERE c2 > 5

insert into t10 select * from generate_series(1,1) a, generate_series(1,10) 
b;
(repeat a bunch of times, if desired)

vacuum analyze t10;
set work_mem = '64kB';
set enable_indexscan = false;
set enable_seqscan = false;

=*> explain analyze select * from t10 where c2 > 6;
QUERY PLAN  
  
--
 Bitmap Heap Scan on t10  (cost=6496.49..15037.50 rows=318653 width=8) (actual 
time=34.682..116.236 rows=32 loops=1)
   Recheck Cond: (c2 > 5)
   Rows Removed by Index Recheck: 327502
   Filter: (c2 > 6)
   Rows Removed by Filter: 8
   Heap Blocks: exact=642 lossy=2898
   ->  Bitmap Index Scan on t10_c1_idx  (cost=0.00..6416.83 rows=400081 
width=0) (actual time=34.601..34.601 rows=40 loops=1)
 Planning time: 0.087 ms
 Execution time: 124.229 ms
(9 rows)

Perhaps I'm missing something obvious, but isn't it a bit redundant to
have both a Recheck condition (which is the predicate of the index) and
a Filter condition (which is the user's predicate) when we've already
decided that the user's predicate must result in a subset of the
index's, as, otherwise, we wouldn't be able to use the index in the
first place?

In other words, it seems like we shouldn't need a Filter in the above
Bitmap Heap Scan, instead we should just make the Recheck be (c2 > 6).

I've not looked into the code side of this at all and there may be
reasons why this is hard to do, but it seems like a worthwhile
improvement to consider doing, though perhaps I'm missing some reason
why we need both the Recheck and the Filter in such cases for
correctness.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Aleksander Alekseev

Hi, Andres

Thanks a lot for the review!

> Why are we keeping that list / the "batch" allocation pattern? It
> doesn't actually seem useful to me after these changes.  Given that we
> don't shrink hash-tables, we could just use the hash-table memory for
> this, no?

I don't think we can do that. According to comments:

```
 * NOTE: once allocated, TabStatusArray structures are never moved or deleted
 [...]
 * This allows relcache pgstat_info pointers to be treated as long-lived data,
 * avoiding repeated searches in pgstat_initstats() when a relation is
 * repeatedly opened during a transaction.
```

It is my understanding that dynahash can't give same guarantees.

> 'faster ... lookup' doesn't strike me as a useful comment, because it's
> only useful in relation of the current code to the new code.

Agree, fixed to 'O(1) ... lookup'.

> It's not really freed...

Right, it's actually zeroed. Fixed.

> Arguably it'd be better to HASH_ENTER directly here, instead of doing
> two lookups.

Good point. Fixed.

> Think it'd be better to move the hash creation into its own function.

Agree, fixed.

On Mon, Mar 06, 2017 at 12:01:50PM -0800, Andres Freund wrote:
> Hi,
> 
> This issue has bothered me in non-partitioned use-cases recently, so
> thanks for taking it up.
> 
> 
> On 2017-03-06 18:22:17 +0300, Aleksander Alekseev wrote:
> > diff --git a/src/backend/postmaster/pgstat.c 
> > b/src/backend/postmaster/pgstat.c
> > index 2fb9a8bf58..fa906e7930 100644
> > --- a/src/backend/postmaster/pgstat.c
> > +++ b/src/backend/postmaster/pgstat.c
> > @@ -160,6 +160,16 @@ typedef struct TabStatusArray
> >  
> >  static TabStatusArray *pgStatTabList = NULL;
> 
> Why are we keeping that list / the "batch" allocation pattern? It
> doesn't actually seem useful to me after these changes.  Given that we
> don't shrink hash-tables, we could just use the hash-table memory for
> this, no?
> 
> I think a separate list that only contains things that are "active" in
> the current transaction makes sense, because the current logic requires
> iterating over a potentially very large array at transaction commit.
> 
> 
> > +/* pgStatTabHash entry */
> > +typedef struct TabStatHashEntry
> > +{
> > +   Oid t_id;
> > +   PgStat_TableStatus* tsa_entry;
> > +} TabStatHashEntry;
> > +
> > +/* Hash table for faster t_id -> tsa_entry lookup */
> > +static HTAB *pgStatTabHash = NULL;
> > +
> 
> 'faster ... lookup' doesn't strike me as a useful comment, because it's
> only useful in relation of the current code to the new code.
> 
> 
> 
> >  /*
> >   * Backends store per-function info that's waiting to be sent to the 
> > collector
> >   * in this hash table (indexed by function OID).
> > @@ -815,7 +825,13 @@ pgstat_report_stat(bool force)
> > pgstat_send_tabstat(this_msg);
> > this_msg->m_nentries = 0;
> > }
> > +
> > +   /*
> > +* Entry will be freed soon so we need to remove it 
> > from the lookup table.
> > +*/
> > +   hash_search(pgStatTabHash, &entry->t_id, HASH_REMOVE, 
> > NULL);
> > }
> 
> It's not really freed...
> 
> 
> >  static PgStat_TableStatus *
> >  get_tabstat_entry(Oid rel_id, bool isshared)
> >  {
> > +   TabStatHashEntry* hash_entry;
> > PgStat_TableStatus *entry;
> > TabStatusArray *tsa;
> > -   TabStatusArray *prev_tsa;
> > -   int i;
> > +
> > +   /* Try to find an entry */
> > +   entry = find_tabstat_entry(rel_id);
> > +   if(entry != NULL)
> > +   return entry;
> 
> Arguably it'd be better to HASH_ENTER directly here, instead of doing
> two lookups.
> 
> 
> > /*
> > -* Search the already-used tabstat slots for this relation.
> > +* Entry doesn't exist - creating one.
> > +* First make sure there is a free space in a last element of 
> > pgStatTabList.
> >  */
> > -   prev_tsa = NULL;
> > -   for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = 
> > tsa->tsa_next)
> > +   if (!pgStatTabList)
> > {
> > -   for (i = 0; i < tsa->tsa_used; i++)
> > -   {
> > -   entry = &tsa->tsa_entries[i];
> > -   if (entry->t_id == rel_id)
> > -   return entry;
> > -   }
> > +   HASHCTL ctl;
> >  
> > -   if (tsa->tsa_used < TABSTAT_QUANTUM)
> > +   Assert(!pgStatTabHash);
> > +
> > +   memset(&ctl, 0, sizeof(ctl));
> > +   ctl.keysize = sizeof(Oid);
> > +   ctl.entrysize = sizeof(TabStatHashEntry);
> > +   ctl.hcxt = TopMemoryContext;
> > +
> > +   pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup 
> > table",
> > +   TABSTAT_QUANTUM, &ctl, HASH_ELEM | HASH_BLOBS | 
> > HASH_CONTEXT);
> 
> Think it'd be better to move the hash creation into its own function.
> 
> 
> - Andres

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postma

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar

On Thu, Mar 9, 2017 at 9:17 PM, David Rowley
 wrote:
> patch with [1]
>
> =# create table r1(value int);
> CREATE TABLE
> =# insert into r1 select (random()*1000)::int from
> generate_Series(1,100);
> INSERT 0 100
> =# create index on r1 using brin(value);
> CREATE INDEX
> =# set enable_seqscan=0;
> SET
> =# explain select * from r1 where value=555;

I am looking into the issue, I have already reproduced it.  I will
update on this soon.

Thanks for reporting.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning optimization for large amount of partitions

2017-03-09 Thread Aleksander Alekseev

Hi Amit,

> Sorry, I didn't mean to dissuade you from trying those
> micro-optimizations.  If general inheritance cases can benefit from it
> (which, until we have a different method, will be used by partitioned
> tables as well), I think we should try it.

OK, I'll see what could be done here as well then.

On Tue, Mar 07, 2017 at 10:55:12AM +0900, Amit Langote wrote:
> Hi Aleksander,
> 
> On 2017/03/07 0:22, Aleksander Alekseev wrote:
> > Hello.
> > 
> > OK, here is a patch.
> > 
> > Benchmark, before:
> > 
> > ```
> > number of transactions actually processed: 1823
> > latency average = 1153.495 ms
> > latency stddev = 154.366 ms
> > tps = 6.061104 (including connections establishing)
> > tps = 6.061211 (excluding connections establishing)
> > ```
> > 
> > Benchmark, after:
> > 
> > ```
> > number of transactions actually processed: 2462
> > latency average = 853.862 ms
> > latency stddev = 112.270 ms
> > tps = 8.191861 (including connections establishing)
> > tps = 8.192028 (excluding connections establishing)
> > ```
> > 
> > +35% TPS, just as expected. Feel free to run your own benchmarks on
> > different datasets and workloads. `perf top` shows that first bottleneck
> > is completely eliminated.
> 
> That seems like a good gain.
> 
> > I did nothing about the second bottleneck
> > since as Amit mentioned partition-pruning should solve this anyway and
> > apparently any micro-optimizations don't worth an effort.
> 
> Sorry, I didn't mean to dissuade you from trying those
> micro-optimizations.  If general inheritance cases can benefit from it
> (which, until we have a different method, will be used by partitioned
> tables as well), I think we should try it.
> 
> Thanks,
> Amit
> 
> 

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature

Re: [HACKERS] Gather Merge

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 8:21 AM, Rushabh Lathia  wrote:
> Thanks Robert for committing this.
>
> My colleague Neha Sharma found one regression with the patch. I was about
> to send this mail and noticed that you committed the patch.

Oops.  Bad timing.

> postgres=# explain select aid from pgbench_accounts where aid % 25= 0 group
> by aid;
> ERROR:  ORDER/GROUP BY expression not found in targetlist

I think your fix for this looks right, although I would write it this way:

-gm_plan->plan.targetlist = subplan->targetlist;
+gm_plan->plan.targetlist = build_path_tlist(root, &best_path->path);

The second part of your fix looks wrong.  I think you want this:

 create_gather_merge_path(root,
  grouped_rel,
  subpath,
- NULL,
+ partial_grouping_target,
  root->group_pathkeys,
  NULL,
  &total_groups);

That will match the create_gather_path case.

This test case is still failing for me even with those fixes:

rhaas=# select aid+1 from pgbench_accounts group by aid+1;
ERROR:  could not find pathkey item to sort

So evidently there is at least one more bug here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CORRESPONDING clause design

2017-03-09 Thread Pavel Stehule

2017-03-09 13:18 GMT+01:00 Surafel Temesgen :

> Hi ,
>
> Here is a patch corrected as your feedback except missed tests case
> because corresponding by clause is implemented on the top of set operation
> and you can’t do that to set operation without corresponding by clause too
>

I don't understand.

The following statement should to work

postgres=# select 10 as a, 20 as b union corresponding select 20 as a, 30
as b, 40 as c;

ERROR:  each UNION query must have the same number of columns
LINE 1: ...elect 10 as a, 20 as b union corresponding select 20 as a, 3...


Corresponding clause should to work like projection filter.

Regards

Pavel


>
> Eg
>
>
> postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;
>
> ERROR:  each UNION query must have the same number of columns
>
> LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;
>
>   ^
>
> postgres=# create table t1(a int, b int, c int);
>
> CREATE TABLE
>
> postgres=# create table t2(a int, b int);
>
> CREATE TABLE
>
>
>
> postgres=# select * from t1 union select * from t2;
>
> ERROR:  each UNION query must have the same number of columns
>
> LINE 1: select * from t1 union select * from t2;
>
>
>
> Thanks
>
> Surafel
>
> On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule 
> wrote:
>
>> Hi
>>
>> I am sending a review of this interesting feature.
>>
>> I found following issues, questions:
>>
>> 1. unclosed tags  in documentation
>> 2. bad name "changeTargetEntry" - should be makeTargetEntry?
>> 3. Why you removed lot of asserts in prepunion.c? These asserts should be
>> valid still
>> 4. make_coresponding_target has wrong formatting
>> 5. error "%s queries with a CORRESPONDING clause must have at least one
>> column with the same name" has wrong formatting, you can show position
>> 6. previous issue is repeated - look on formatting ereport function,
>> please, you can use DETAIL and HINT fields
>> 7. corresponding clause should to contain column list (I am looking to
>> ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it
>> has impact on all implementation.
>> 8. typo orderCorrespondingLsit(List *targetlist)
>> 9. I miss more tests for CORRESPONDING BY
>> 10. if I understand to this feature, this query should to work
>>
>> postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 
>> b, 6 c, 8 d;
>> ERROR:  each UNION query must have the same number of columns
>> LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...
>>
>> postgres=# create table t1(a int, b int, c int);
>> CREATE TABLE
>> Time: 63,260 ms
>> postgres=# create table t2(a int, b int);
>> CREATE TABLE
>> Time: 57,120 ms
>> postgres=# select * from t1 union corresponding select * from t2;
>> ERROR:  each UNION query must have the same number of columns
>> LINE 1: select * from t1 union corresponding select * from t2;
>>
>> If it is your first patch to Postgres, then it is perfect work!
>>
>> The @7 is probably most significant - I dislike a expression list there.
>> name_list should be better there.
>>
>> Regards
>>
>> Pavel
>>
>>
>

Re: [HACKERS] Gather Merge

2017-03-09 Thread Rushabh Lathia

On Thu, Mar 9, 2017 at 9:42 PM, Robert Haas  wrote:

> On Thu, Mar 9, 2017 at 8:21 AM, Rushabh Lathia 
> wrote:
> > Thanks Robert for committing this.
> >
> > My colleague Neha Sharma found one regression with the patch. I was about
> > to send this mail and noticed that you committed the patch.
>
> Oops.  Bad timing.
>
> > postgres=# explain select aid from pgbench_accounts where aid % 25= 0
> group
> > by aid;
> > ERROR:  ORDER/GROUP BY expression not found in targetlist
>
> I think your fix for this looks right, although I would write it this way:
>
> -gm_plan->plan.targetlist = subplan->targetlist;
> +gm_plan->plan.targetlist = build_path_tlist(root, &best_path->path);
>
> The second part of your fix looks wrong.  I think you want this:
>
>  create_gather_merge_path(root,
>   grouped_rel,
>   subpath,
> - NULL,
> + partial_grouping_target,
>   root->group_pathkeys,
>   NULL,
>   &total_groups);
>
> That will match the create_gather_path case.
>
>
Right, I did that change and perform the test with the fix and I don't
see any regression now.


> This test case is still failing for me even with those fixes:
>
> rhaas=# select aid+1 from pgbench_accounts group by aid+1;
> ERROR:  could not find pathkey item to sort
>
>
I don't see this failure with the patch. Even I forced the gather merge
in the above query and that just working fine.

Attaching patch, with the discussed changes.



Thanks,
Rushabh Lathia


fix_target_gm_v2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] dblink leaks unnamed connections

2017-03-09 Thread Joe Conway

On 03/09/2017 07:54 AM, Tom Lane wrote:
> Fujii Masao  writes:
>> On Wed, Mar 8, 2017 at 3:52 PM, Tsunakawa, Takayuki
>>  wrote:
>>> dblink fails to close the unnamed connection as follows when a new unnamed 
>>> connection is requested.  The attached patch fixes this.
> 
>> This issue was reported about ten years ago and added as TODO item.
>> http://archives.postgresql.org/pgsql-hackers/2007-10/msg00895.php
> 
>> I agree that this is a bug, and am tempted to back-patch to all the supported
>> versions. But it had not been fixed in many years since the first report of
>> the issue. So I'm not sure if it's ok to just treat this as a bug right now 
>> and
>> back-patch. Or we should fix this only in HEAD? Anyway I'd like to hear
>> more opinions about this.
> 
> It looks to me like the issue simply fell through the cracks because Joe
> wasn't excited about fixing it.  Now that we have a second complaint,
> I think it's worth treating as a bug and back-patching.
> 
> (I've not read this particular patch and am not expressing an opinion
> whether it's correct.)

Ok, will take another look.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar

On Thu, Mar 9, 2017 at 9:37 PM, Dilip Kumar  wrote:
>> =# create table r1(value int);
>> CREATE TABLE
>> =# insert into r1 select (random()*1000)::int from
>> generate_Series(1,100);
>> INSERT 0 100
>> =# create index on r1 using brin(value);
>> CREATE INDEX
>> =# set enable_seqscan=0;
>> SET
>> =# explain select * from r1 where value=555;
>
> I am looking into the issue, I have already reproduced it.  I will
> update on this soon.
>
> Thanks for reporting.

I slightly modified your query to reproduce this issue.

explain analyze select * from r1 where value<555;

Patch is attached to fix the problem.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


parallel_bitmap_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Tom Lane

Stephen Frost  writes:
> Perhaps I'm missing something obvious, but isn't it a bit redundant to
> have both a Recheck condition (which is the predicate of the index) and
> a Filter condition (which is the user's predicate) when we've already
> decided that the user's predicate must result in a subset of the
> index's, as, otherwise, we wouldn't be able to use the index in the
> first place?

Yeah, I think this is just something that the planner doesn't see fit
to expend cycles on detecting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Dilip Kumar

On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar  wrote:
> I slightly modified your query to reproduce this issue.
>
> explain analyze select * from r1 where value<555;
>
> Patch is attached to fix the problem.

I forgot to mention the cause of the problem.

if (istate->schunkptr < istate->nchunks)
{
   PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
BlockNumber chunk_blockno;

In above if condition we have only checked istate->schunkptr <
istate->nchunks that means we have some chunk left so we are safe to
access idxchunks,  But just after that we are accessing
ptbase[idxpages[istate->spageptr]] without checking that accessing
idxpages is safe or not.

tbm_iterator already handling this case, I broke it in tbm_shared_iterator.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-09 Thread Masahiko Sawada

On Wed, Mar 8, 2017 at 1:43 AM, Peter Geoghegan  wrote:
> On Sat, Mar 4, 2017 at 1:30 AM, Amit Kapila  wrote:
>>> While I can't see this explained anywhere, I'm
>>> pretty sure that that's supposed to be impossible, which this patch
>>> changes.
>>>
>>
>> What makes you think that patch will allow pg_class.relfrozenxid to be
>> advanced past opaque->btpo.xact which was previously not possible?
>
> By not reliably recycling pages in a timely manner, we won't change
> anything about the dead page. It just sticks around. This is mostly
> fine, but we still need VACUUM to be able to reason about it (to
> determine if it is in fact recyclable), which is what I'm concerned
> about breaking here. It still needs to be *possible* to recycle any
> recyclable page at some point (e.g., when we find it convenient).
>
> pg_class.relfrozenxid is InvalidTransactionId for indexes because
> indexes generally don't store XIDs. This is the one exception that I'm
> aware of, presumably justified by the fact that it's only for
> recyclable pages anyway, and those are currently *guaranteed* to get
> recycled quickly. In particular, they're guaranteed to get recycled by
> the next VACUUM. They may be recycled in the course of anti-wraparound
> VACUUM, even if VACUUM has no garbage tuples to kill (even if we only
> do lazy_cleanup_index() instead of lazy_vacuum_index()). This is the
> case that this patch proposes to have us skip touching indexes for.
>

To prevent this, I think we need to not skip the lazy_cleanup_index
when ant-wraparound vacuum (aggressive = true) even if the number of
scanned pages is less then the threshold. This can ensure that
pg_class.relfrozenxid is not advanced past opaque->bp.xact with
minimal pain. Even if the btree dead page is leaved, the subsequent
modification makes garbage on heap and then autovauum recycles that
page while index vacuuming(lazy_index_vacuum).

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > Perhaps I'm missing something obvious, but isn't it a bit redundant to
> > have both a Recheck condition (which is the predicate of the index) and
> > a Filter condition (which is the user's predicate) when we've already
> > decided that the user's predicate must result in a subset of the
> > index's, as, otherwise, we wouldn't be able to use the index in the
> > first place?
> 
> Yeah, I think this is just something that the planner doesn't see fit
> to expend cycles on detecting.

We have already figured out that the user's predicate results in a
subset of the index's or we wouldn't be able to use that index though,
right?  Do we really need to spend cycles re-discovering that?  Are
there cases where we actually need the index's predicate to ever be
included for correctness..?

This seems like we're going out of our way to add in an additional check
for something that we've already determined must always be true and that
strikes me as odd.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Gather Merge

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 11:25 AM, Rushabh Lathia
 wrote:
> I don't see this failure with the patch. Even I forced the gather merge
> in the above query and that just working fine.
>
> Attaching patch, with the discussed changes.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Bitmap scans a bit broken

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 11:50 AM, Dilip Kumar  wrote:
> On Thu, Mar 9, 2017 at 10:02 PM, Dilip Kumar  wrote:
>> I slightly modified your query to reproduce this issue.
>>
>> explain analyze select * from r1 where value<555;
>>
>> Patch is attached to fix the problem.
>
> I forgot to mention the cause of the problem.
>
> if (istate->schunkptr < istate->nchunks)
> {
>PagetableEntry *chunk = &ptbase[idxchunks[istate->schunkptr]];
> PagetableEntry *page = &ptbase[idxpages[istate->spageptr]];
> BlockNumber chunk_blockno;
>
> In above if condition we have only checked istate->schunkptr <
> istate->nchunks that means we have some chunk left so we are safe to
> access idxchunks,  But just after that we are accessing
> ptbase[idxpages[istate->spageptr]] without checking that accessing
> idxpages is safe or not.
>
> tbm_iterator already handling this case, I broke it in tbm_shared_iterator.

I don't know if this is the only problem -- it would be good if David
could retest -- but it's certainly *a* problem, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Tom Lane

Stephen Frost  writes:
> We have already figured out that the user's predicate results in a
> subset of the index's or we wouldn't be able to use that index though,
> right?  Do we really need to spend cycles re-discovering that?  Are
> there cases where we actually need the index's predicate to ever be
> included for correctness..?

In a bitmap indexscan, yes, because the entire point of the recheck
condition is that we're going to scan a whole page and possibly see
tuples that don't satisfy the index predicate at all.  Another point
that's mentioned in the comments in createplan.c is that if you're
considering the result of a BitmapOr or BitmapAnd, there's not necessarily
a single index involved so it's much harder to reason about which part
of the qual is an index predicate.

I do notice that createplan.c makes some effort to get rid of filter
conditions that are provably true when the index conditions are.
Doesn't look like it considers the reverse direction.  Not sure if
that's missing a bet.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use SQL standard error code for nextval

2017-03-09 Thread Mark Dilger


> On Mar 9, 2017, at 7:59 AM, Peter Eisentraut 
>  wrote:
> 
> On 2/28/17 22:15, Peter Eisentraut wrote:
>> The SQL standard defines a separate error code for nextval exhausting
>> the sequence space.  I haven't found any discussion of this in the
>> archives, so it seems this was just not considered or not yet in
>> existence when the error codes were introduced.  Here is a patch to
>> correct it.
> 
> committed

Perhaps you should add something to the release notes.  Somebody could be
testing for the old error code.

Mark Dilger

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 7:47 AM, Naytro Naytro  wrote:
> We are having some performance issues after we upgraded to newest
> version of PostgreSQL, before it everything was fast and smooth.
>
> Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> upgraded to 9.6.2 with no improvement.
>
> Some information about our setup: Freebsd, Solaris (SmartOS), simple
> master-slave using streaming replication.
>
> Problem:
> Very high system CPU when master is streaming replication data, CPU
> goes up to 77%. Only one process is generating this load, it's a
> postgresql startup process. When I attached a truss to this process I
> saw a lot o read calls with almost the same number of errors (EAGAIN).
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
>
> Descriptor 6 is a pipe
>
> Read call try to read one byte over and over, I looked up to source
> code and I think this file is responsible for this behavior
> src/backend/storage/ipc/latch.c. There was no such file in 9.4.

Our latch implementation did get overhauled pretty thoroughly in 9.6;
see primarily commit 98a64d0bd713cb89e61bef6432befc4b7b5da59e.  But I
can't guess what is going wrong here based on this information.  It
might help if you can pull some stack backtraces from the startup
process.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Stephen Frost

Tom,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > We have already figured out that the user's predicate results in a
> > subset of the index's or we wouldn't be able to use that index though,
> > right?  Do we really need to spend cycles re-discovering that?  Are
> > there cases where we actually need the index's predicate to ever be
> > included for correctness..?
> 
> In a bitmap indexscan, yes, because the entire point of the recheck
> condition is that we're going to scan a whole page and possibly see
> tuples that don't satisfy the index predicate at all.

I understand the point of the recheck condition being that we might see
tuples that don't satisfy either the index predicate or the user's
predicate, but that isn't the point- to even consider using that index
we must have already realized that the user's predicate will only be
satisfied in a subset of cases when the index predicate predicate will
be satisfied, making any check of the index predicate necessairly always
true.

> Another point
> that's mentioned in the comments in createplan.c is that if you're
> considering the result of a BitmapOr or BitmapAnd, there's not necessarily
> a single index involved so it's much harder to reason about which part
> of the qual is an index predicate.

We must be pulling the index's predicate to be able to put it into the
Recheck condition in the first place.  What I'm arguing is that once
we've decided that we're able to use an index X, because the values
which the user's predicate will satisfy is a subset of those which
the index's predicate will satisfy, then we can entirely forget about
the index's predicate as being redundant to the user's.

I don't see anything about a BitmapAnd or BitmapOr being relevant to
that.  We will always need to check the user's predicate against the
tuples being returned from the Bitmap Heap Scan, unless by chance it
matches the index's predicate exactly *and* we have an entirely exact
match bitmap without any lossy parts.

> I do notice that createplan.c makes some effort to get rid of filter
> conditions that are provably true when the index conditions are.
> Doesn't look like it considers the reverse direction.  Not sure if
> that's missing a bet.

That actually strikes me as a less likely condition to have in the
general case, isn't it?  Wouldn't that only happen if the index
predicate and the user predicate are identical, because otherwise we
either can't use the index or we must keep the user's predicate because
it will only be satisfied in a subset of cases?

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] use SQL standard error code for nextval

2017-03-09 Thread Peter Eisentraut

On 3/9/17 12:27, Mark Dilger wrote:
> Perhaps you should add something to the release notes.  Somebody could be
> testing for the old error code.

The release notes will be written when the release is prepared.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Write Ahead Logging for Hash Indexes

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 10:23 AM, Amit Kapila  wrote:
>> +mode anyway).  It would seem natural to complete the split in VACUUM, but 
>> since
>> +splitting a bucket might require allocating a new page, it might fail if you
>> +run out of disk space.  That would be bad during VACUUM - the reason for
>> +running VACUUM in the first place might be that you run out of disk space,
>> +and now VACUUM won't finish because you're out of disk space.  In contrast,
>> +an insertion can require enlarging the physical file anyway.
>>
>> But VACUUM actually does try to complete the splits, so this is wrong, I 
>> think.
>
> Vacuum just tries to clean the remaining tuples from old bucket.  Only
> insert or split operation tries to complete the split.

Oh, right.  Sorry.

>> +if (!xlrec.is_primary_bucket_page)
>> +XLogRegisterBuffer(0, bucket_buf, REGBUF_STANDARD);
>>
>> So, in hashbucketcleanup, the page is registered and therefore an FPI
>> will be taken if this is the first reference to this page since the
>> checkpoint, but hash_xlog_delete won't restore the FPI.  I think that
>> exposes us to a torn-page hazard.
>> _hash_freeovflpage/hash_xlog_squeeze_page seems to have the same
>> disease, as does _hash_squeezebucket/hash_xlog_move_page_contents.
>
> Right, if we use XLogReadBufferForRedoExtended() instead of
> XLogReadBufferExtended()/LockBufferForCleanup during relay routine,
> then it will restore FPI when required and can serve our purpose of
> acquiring clean up lock on primary bucket page.  Yet another way could
> be to store some information like block number, relfilenode, forknum
> (maybe we can get away with some less info) of primary bucket in the
> fixed part of each of these records and then using that read the page
> and then take a cleanup lock.   Now, as in most cases, this buffer
> needs to be registered only in cases when there are multiple overflow
> pages, I think having fixed information might hurt in some cases.

Just because the WAL record gets larger?  I think you could arrange to
record the data only in the cases where it is needed, but I'm also not
sure it would matter that much anyway.  Off-hand, it seems better than
having to mark the primary bucket page dirty (because you have to set
the LSN) every time any page in the chain is modified.

>> +/*
>> + * Note that we still update the page even if it was restored from a 
>> full
>> + * page image, because the bucket flag is not included in the image.
>> + */
>>
>> Really?  Why not?  (Because it's part of the special space?)
>
> Yes, because it's part of special space.  Do you want that to
> mentioned in comments?

Seems like a good idea.  Maybe just change the end to "because the
special space is not included in the image" to make it slightly more
explicit.

>> Doesn't hash_xlog_split_allocpage need to acquire cleanup locks to
>> guard against concurrent scans?
>
> I don't think so, because regular code takes it on old bucket to
> protect the split from concurrent inserts and cleanup lock on new
> bucket is just for the sake of consistency.

You might be right, but this definitely does create a state on the
standby that can't occur on the master: the bucket being split is
flagged as such, but the bucket being populated doesn't appear to
exist yet.  Maybe that's OK.  It makes me slightly nervous to have the
standby take a weaker lock than the master does, but perhaps it's
harmless.

>> And also hash_xlog_split_complete?
>
> In regular code, we are not taking cleanup lock for this operation.

You're right, sorry.

>> And hash_xlog_delete?  If the regular code path is getting a cleanup
>> lock to protect against concurrent scans, recovery better do it, too.
>>
>
> We are already taking cleanup lock for this, not sure what exactly is
> missed here, can you be more specific?

Never mind, I'm wrong.

>> +/*
>> + * There is no harm in releasing the lock on old bucket before new 
>> bucket
>> + * during replay as no other bucket splits can be happening 
>> concurrently.
>> + */
>> +if (BufferIsValid(oldbuf))
>> +UnlockReleaseBuffer(oldbuf);
>>
>> And I'm not sure this is safe either.  The issue isn't that there
>> might be another bucket split happening concurrently, but that there
>> might be scans that get confused by seeing the bucket split flag set
>> before the new bucket is created or the metapage updated.
>
> During scan we check for bucket being populated, so this should be safe.

Yeah, I guess so.  This business of the standby taking weaker locks
still seems scary to me, as in hash_xlog_split_allocpage above.

>>  Seems like
>> it would be safer to keep all the locks for this one.
>
> If you feel better that way, I can change it and add a comment saying
> something like "we could release lock on bucket being split early as
> well, but doing here to be consistent with normal operation"

I think that might not be a bad idea.

-- 
Robert Haas
EnterpriseDB: http://ww

Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Peter Eisentraut

On 3/6/17 05:27, Petr Jelinek wrote:
> updated and rebased version of the patch attached.

Some comments on this patch:

Can we have a better explanation of different snapshot options in
CREATE_REPLICATION_SLOT.  We use all these variants in different
places, but it's not fully documented why.  Think about interested
users reading this.


errmsg("subscription table %u in subscription %u does not exist",

Use names instead of IDs if possible.


+   libpqrcv_table_list,
+   libpqrcv_table_info,
+   libpqrcv_table_copy,

I don't think these functions belong into the WAL receiver API, since
they are specific to this particular logical replication
implementation.  I would just make an API function libpqrc_exec_sql()
that runs a query, and then put the table_*() functions as wrappers
around that into tablesync.c.


Not sure what the worker init stuff in ApplyLauncherShmemInit() is
doing.  Could be commented more.


There are a lot of places that do one of

MyLogicalRepWorker->relid == InvalidOid
OidIsValid(MyLogicalRepWorker->relid)

To check whether the current worker is a sync worker.  Wrap that into
a function.


Not a fan of adding CommandCounterIncrement() calls at the end of
functions.  In some cases, they are not necessary at all.  In cases
where they are, the CCI call should be at a higher level between two
function calls with a comment for why the call below needs to see the
changes from the call above.


The index name pg_subscription_rel_map_index/SubscriptionRelMapIndexId
doesn't seem to match existing style, since there is no "map" column.
How about pg_subscription_rel_rel_sub_index?  I see we have a
similarly named index for publications.


max_sync_workers_per_subscription could be PGC_SIGHUP, I think.  And
the minimum could be 0, to stop any syncing?


pg_subscription_rel.h: I'm not sure under which circumstances we need
to use BKI_ROWTYPE_OID.


Does pg_subscription_rel need an OID column?  The index
SubscriptionRelOidIndexId is not used anywhere.


You removed this command from the tests:

ALTER SUBSCRIPTION testsub SET PUBLICATION testpub, testpub1;

I suppose because it causes a connection.  We should have an option to
prevent that (NOCONNECT/NOREFRESH?).


Why was the test 'check replication origin was dropped on subscriber'
removed?


Attached also a small patch with some stylistic changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From c6ba1eaa3c5a44e4a9f6d072cb95fcf7e68ba3d6 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Mar 2017 08:19:06 -0500
Subject: [PATCH] fixup! Logical replication support for initial data copy

---
 doc/src/sgml/catalogs.sgml  |  9 +
 doc/src/sgml/logical-replication.sgml   | 18 +-
 doc/src/sgml/monitoring.sgml|  4 ++--
 doc/src/sgml/protocol.sgml  | 14 +++---
 doc/src/sgml/ref/alter_subscription.sgml|  8 
 doc/src/sgml/ref/create_subscription.sgml   | 13 ++---
 src/backend/catalog/pg_subscription.c   | 13 -
 src/backend/replication/logical/launcher.c  | 18 +++---
 src/backend/replication/logical/snapbuild.c |  6 +++---
 src/backend/replication/logical/worker.c|  4 ++--
 src/backend/replication/repl_gram.y |  1 +
 src/backend/replication/walsender.c |  2 +-
 src/backend/tcop/postgres.c |  8 +---
 13 files changed, 56 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f587a08b6a..ab78585035 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -302,7 +302,7 @@ System Catalogs
 
  
   pg_subscription_rel
-  relation state mapping for subscriptions
+  relation state for subscriptions
  
 
  
@@ -6429,14 +6429,14 @@ pg_subscription_rel
 
   
The catalog pg_subscription_rel contains the
-   status for each replicated relation in each subscription.  This is a
+   state for each replicated relation in each subscription.  This is a
many-to-many mapping.
   
 
   
-   This catalog only contains tables known to subscription after running
+   This catalog only contains tables known to the subscription after running
either CREATE SUBSCRIPTION or
-   ALTER SUBSCRIPTION ... REFRESH commands.
+   ALTER SUBSCRIPTION ... REFRESH.
   
 
   
@@ -6472,6 +6472,7 @@ pg_subscription_rel Columns
   char
   
   
+   State code:
i = initialize,
d = data is being copied,
s = synchronized,
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index f75304cd91..4ec6bb49b7 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -24,8 +24,8 @@ Logical Replication
  
 
  
-  Logical replication typically starts with a snapshot of the data on
-  the publisher database. Once that is do

Re: [HACKERS] Logical replication existing data copy

2017-03-09 Thread Peter Eisentraut

On 3/6/17 05:27, Petr Jelinek wrote:
> And lastly I changed the automagic around exporting, not exporting and
> using the snapshot produced by CREATE_REPLICATION_SLOT into explicit
> parameter for the CREATE_REPLICATION_SLOT command (and added simple
> framework for adding more of those if needed in the future).

It might be useful to make this into a separate patch, for clarity.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Sven R. Kunze


On 08.03.2017 20:48, Peter van Hardenberg wrote:

Small point of order: YAML is not strictly a super-set of JSON.


I haven't read the whole standard, but from what I can see the standard 
considers JSON an official subset of itself: 
http://www.yaml.org/spec/1.2/spec.html


Regards,
Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use SQL standard error code for nextval

2017-03-09 Thread Tom Lane

Mark Dilger  writes:
>> On Mar 9, 2017, at 7:59 AM, Peter Eisentraut 
>>  wrote:
>> On 2/28/17 22:15, Peter Eisentraut wrote:
>>> The SQL standard defines a separate error code for nextval exhausting
>>> the sequence space.  I haven't found any discussion of this in the
>>> archives, so it seems this was just not considered or not yet in
>>> existence when the error codes were introduced.  Here is a patch to
>>> correct it.

> Perhaps you should add something to the release notes.  Somebody could be
> testing for the old error code.

The release notes for v10 aren't going to be drafted for months yet.
When they are, hopefully the writer will notice that this should be
listed as an incompatible change.  That's not the responsibility
of this commit, although it would've been better if the commit log
entry explicitly pointed out that it's an incompatible change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 12:48 PM, Sven R. Kunze  wrote:
> On 08.03.2017 20:48, Peter van Hardenberg wrote:
>>
>> Small point of order: YAML is not strictly a super-set of JSON.
>
> I haven't read the whole standard, but from what I can see the standard
> considers JSON an official subset of itself:
> http://www.yaml.org/spec/1.2/spec.html

But there's apparent sophistry, like this, in that spec:

SON's RFC4627 requires that mappings keys merely “SHOULD” be unique,
while YAML insists they “MUST” be. Technically, YAML therefore
complies with the JSON spec, choosing to treat duplicates as an error.
In practice, since JSON is silent on the semantics of such duplicates,
the only portable JSON files are those with unique keys, which are
therefore valid YAML files.

I don't see how YAML can impose a stronger requirement than JSON and
yet claim to be a superset; a JSON document that doesn't meet that
requirement will be legal (if stupid) as JSON but illegal as YAML.

Also, even if the superset thing were true on a theoretical plane, I'm
not sure it would do us much good in practice.  If we start using
YAML-specific constructs, we won't have valid JSON any more.  If we
use only things that are legal in JSON, YAML's irrelevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Enabling atomics on ARM64

2017-03-09 Thread Robert Haas

On Wed, Mar 8, 2017 at 10:18 PM, Roman Shaposhnik  wrote:
> I'd like to offer a forward port from a change I'm contributing
> the Greenplum code base over here:
> https://github.com/greenplum-db/gpdb/pull/1983

Thanks.   Please add this to https://commitfest.postgresql.org/14/ so
it doesn't get forgotten.  I'm afraid your patch is probably too late
for v10 (the deadline for patches to be considered for v10 was March
1st, though somebody might propose to make an exception), but v11 will
be here soon enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] partial indexes and bitmap scans

2017-03-09 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I do notice that createplan.c makes some effort to get rid of filter
>> conditions that are provably true when the index conditions are.
>> Doesn't look like it considers the reverse direction.  Not sure if
>> that's missing a bet.

> That actually strikes me as a less likely condition to have in the
> general case, isn't it?  Wouldn't that only happen if the index
> predicate and the user predicate are identical, because otherwise we
> either can't use the index or we must keep the user's predicate because
> it will only be satisfied in a subset of cases?

Well, remember that the planner's idea of an ideal situation is to not
have any filter conditions, not to not have any index (a/k/a recheck)
conditions.  It's going to try to put as much load as it can on the index
condition side of things, and that gives rise to the need for rechecks.

It seems like there might be some mileage to be gained by reversing the
proof direction here, and having it get rid of recheck conditions that are
implied by filter conditions rather than vice versa.  I'm not quite
convinced though, and I'm also not sure how hard it would be to mechanize.
A lot of that code is shared between the bitmap and plain-indexscan cases,
which could make it tricky to not break the plain-indexscan case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] updating parallel.sgml

2017-03-09 Thread Robert Haas

On Wed, Mar 8, 2017 at 12:46 PM, Robert Haas  wrote:
> Here's a patch which updates parallel.sgml for the new parallel scan
> types which have recently been committed, and also expands the
> explanation of parallel joins slightly.  I hope everyone will find
> this useful in understanding what new capabilities we now have, and
> what remains to be done in the future.
>
> Barring objections, I plan to commit this tomorrow.

Committed with a fix for a typo pointed out to me off-list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Sven R. Kunze


On 08.03.2017 20:52, Magnus Hagander wrote:
On Wed, Mar 8, 2017 at 11:48 AM, Peter van Hardenberg > wrote:


Small point of order: YAML is not strictly a super-set of JSON.

Editorializing slightly, I have not seen much interest in the
world for YAML support though I'd be interested in evidence to the
contrary.


The world of configuration management seems to for some reason run off 
YAML, but that's the only places I've seen it recently (ansible, 
puppet etc).


SaltStack uses YAML for their tools, too. I personally can empathize 
with them (as a user of configuration management) about this as writing 
JSON would be nightmare with all the quoting, commas, curly braces etc. 
But that's my own preference maybe.


(Btw. does "run off" mean like or avoid? At least my dictionaries tend 
to the latter.)


That said if we're introducing something new, it's usually better to 
copy from another format than to invite your own.


From my day-to-day work I can tell, the date(time) type is the only 
missing piece of JSON to make it perfect for business applications 
(besides, maybe, a "currency" type).


Regards,
Sven

Re: [HACKERS] foreign partition DDL regression tests

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 1:19 AM, Ashutosh Bapat
 wrote:
>>> At least we need to update the documentation.
>>
>> Got a proposal?
>
> How about something like attached?

Committed with some revisions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Sven R. Kunze


On 09.03.2017 18:58, Robert Haas wrote:

Also, even if the superset thing were true on a theoretical plane, I'm
not sure it would do us much good in practice.  If we start using
YAML-specific constructs, we won't have valid JSON any more.  If we
use only things that are legal in JSON, YAML's irrelevant.


That's true. I just wanted to share my view of the "date guessing" part 
of pgpro's commits.
I don't have a good solution for it either, I can only tell that where I 
work we do have same issues: either we guess by looking at the string 
value or we know that "this particular key" must be a date.
Unsatisfied with either solution, we tend to use YAML for our APIs if 
possible.


Regards,
Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-03-09 Thread Robert Haas

I think you might have the titles for 0002 and 0003 backwards.

On Fri, Mar 3, 2017 at 2:51 AM, Amit Langote
 wrote:
> 0002: some cosmetic fixes to create_table.sgml

I think this sentence may be unclear to some readers:

+ One might however want to set it for only some partitions,
+  which is possible by doing SET NOT NULL on individual
+  partitions.

I think you could replace this with something like: Even if there is
no NOT NULL constraint on the parent, such a constraint
can still be added to individual partitions, if desired; that is, the
children can disallow nulls even if the parent allows them, but not
the other way around.

> 0003: add clarification about NOT NULL constraint on partition columns in
>   alter_table.sgml

This is about list-ifying a note, but I think we should try to
de-note-ify it.  It's a giant block of text that is not obviously more
noteworthy than the surrounding text; I think  should be saved
for things that particularly need to be called out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: psql show index with type info

2017-03-09 Thread Peter Eisentraut

On 3/8/17 04:11, Ashutosh Bapat wrote:
> The header for this table is "list of relations", so type gets
> associated with relations indicated type of relation. btree: gin as a
> type of relation doesn't sound really great. Instead we might want to
> add another column "access method" and specify the access method used
> for that relation.

I like that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-03-09 Thread Robert Haas

On Mon, Jan 30, 2017 at 6:54 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> Wonder if we there's an argument to be made for implementing this
>> roughly similarly to split_pathtarget_at_srf - instead of injecting a
>> ProjectSet node we'd add a FunctionScan node below a Result node.
>
> Yeah, possibly.  That would have the advantage of avoiding an ExecProject
> step when the SRFs aren't buried, which would certainly be the expected
> case.
>
> If you don't want to make ExecInitExpr responsible, then the planner would
> have to do something like split_pathtarget_at_srf anyway to decompose the
> expressions, no matter which executor representation we use.

Did we do anything about this?  Are we going to?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: psql show index with type info

2017-03-09 Thread Peter Eisentraut

On 3/8/17 08:30, Stephen Frost wrote:
> Right, I don't think having an extra column which is going to be NULL a
> large amount of the time is good.

Note that \di already has a column "Table" that is null for something
that is not an index.  So I don't think this argument is very strong.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: psql show index with type info

2017-03-09 Thread Stephen Frost

Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/8/17 04:11, Ashutosh Bapat wrote:
> > The header for this table is "list of relations", so type gets
> > associated with relations indicated type of relation. btree: gin as a
> > type of relation doesn't sound really great. Instead we might want to
> > add another column "access method" and specify the access method used
> > for that relation.
> 
> I like that.

When it will be NULL for all of the regular tables..?

I'd rather have the one Type column that just included the access method
when there is one to include.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] PATCH: psql show index with type info

2017-03-09 Thread Stephen Frost

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 3/8/17 08:30, Stephen Frost wrote:
> > Right, I don't think having an extra column which is going to be NULL a
> > large amount of the time is good.
> 
> Note that \di already has a column "Table" that is null for something
> that is not an index.  So I don't think this argument is very strong.

That's an interesting point.

I think what I find most odd about all of this is that \dti and \dit
work at all, and give a different set of columns from \dt.  We don't
document that combining those works in \?, as far as I can see, and
other combinations don't work, just this.

In any case, I won't push very hard on this, it's useful information to
include and we should do so.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] New CORRESPONDING clause design

2017-03-09 Thread Pavel Stehule

hi

2017-03-09 17:19 GMT+01:00 Pavel Stehule :

>
>
> 2017-03-09 13:18 GMT+01:00 Surafel Temesgen :
>
>> Hi ,
>>
>> Here is a patch corrected as your feedback except missed tests case
>> because corresponding by clause is implemented on the top of set operation
>> and you can’t do that to set operation without corresponding by clause too
>>
>
> I don't understand.
>
> The following statement should to work
>
> postgres=# select 10 as a, 20 as b union corresponding select 20 as a, 30
> as b, 40 as c;
>
> ERROR:  each UNION query must have the same number of columns
> LINE 1: ...elect 10 as a, 20 as b union corresponding select 20 as a, 3...
>
>
> Corresponding clause should to work like projection filter.
>

I found a link to postgresql mailing list related to some older try to this
feature implementation


https://www.postgresql.org/message-id/cajzswkx7c6wmfo9py4baf8vhz_ofko3afsosjpsb17rgmgb...@mail.gmail.com

> Regards
>
> Pavel
>
>
>>
>> Eg
>>
>>
>> postgres=# SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;
>>
>> ERROR:  each UNION query must have the same number of columns
>>
>> LINE 1: SELECT 1 a, 2 b, 3 c UNION SELECT 4 a, 5 b, 6 c, 8 d;
>>
>>   ^
>>
>> postgres=# create table t1(a int, b int, c int);
>>
>> CREATE TABLE
>>
>> postgres=# create table t2(a int, b int);
>>
>> CREATE TABLE
>>
>>
>>
>> postgres=# select * from t1 union select * from t2;
>>
>> ERROR:  each UNION query must have the same number of columns
>>
>> LINE 1: select * from t1 union select * from t2;
>>
>>
>>
>> Thanks
>>
>> Surafel
>>
>> On Tue, Mar 7, 2017 at 10:26 PM, Pavel Stehule 
>> wrote:
>>
>>> Hi
>>>
>>> I am sending a review of this interesting feature.
>>>
>>> I found following issues, questions:
>>>
>>> 1. unclosed tags  in documentation
>>> 2. bad name "changeTargetEntry" - should be makeTargetEntry?
>>> 3. Why you removed lot of asserts in prepunion.c? These asserts should
>>> be valid still
>>> 4. make_coresponding_target has wrong formatting
>>> 5. error "%s queries with a CORRESPONDING clause must have at least one
>>> column with the same name" has wrong formatting, you can show position
>>> 6. previous issue is repeated - look on formatting ereport function,
>>> please, you can use DETAIL and HINT fields
>>> 7. corresponding clause should to contain column list (I am looking to
>>> ANSI/SQL 99)  - you are using expr_list, what has not sense and probably it
>>> has impact on all implementation.
>>> 8. typo orderCorrespondingLsit(List *targetlist)
>>> 9. I miss more tests for CORRESPONDING BY
>>> 10. if I understand to this feature, this query should to work
>>>
>>> postgres=# SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 
>>> b, 6 c, 8 d;
>>> ERROR:  each UNION query must have the same number of columns
>>> LINE 1: ...1 a, 2 b, 3 c UNION CORRESPONDING BY (c,b) SELECT 4 a, 5 b, ...
>>>
>>> postgres=# create table t1(a int, b int, c int);
>>> CREATE TABLE
>>> Time: 63,260 ms
>>> postgres=# create table t2(a int, b int);
>>> CREATE TABLE
>>> Time: 57,120 ms
>>> postgres=# select * from t1 union corresponding select * from t2;
>>> ERROR:  each UNION query must have the same number of columns
>>> LINE 1: select * from t1 union corresponding select * from t2;
>>>
>>> If it is your first patch to Postgres, then it is perfect work!
>>>
>>> The @7 is probably most significant - I dislike a expression list there.
>>> name_list should be better there.
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>
>

Re: [HACKERS] Couple of issues with prepared FETCH commands

2017-03-09 Thread Robert Haas

On Wed, Jan 11, 2017 at 11:28 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jan 10, 2017 at 5:38 PM, Andrew Gierth
>>  wrote:
>>> But the problem that actually came up is this: if you do the PQprepare
>>> before the named cursor has actually been opened, then everything works
>>> _up until_ the first event, such as a change to search_path, that forces
>>> a revalidation; and at that point it fails with the "must not change
>>> result type" error _even if_ the cursor always has exactly the same
>>> result type.  This happens because the initial prepare actually stored
>>> NULL for plansource->resultDesc, since the cursor name wasn't found,
>>> while on the revalidate, when the cursor obviously does exist, it gets
>>> the actual result type.
>>>
>>> It seems a bit of a "gotcha" to have it fail in this case when the
>>> result type isn't actually being checked in other cases.
>
>> To me, that sounds like a bug.
>
> Yeah --- specifically, I wonder why we allow the reference to an
> unrecognized cursor name to succeed.  Or were you defining the bug
> differently?

I'm not sure whether that's a bug or not.  What I was defining as a
bug is calling a change from "we don't know what the result type will
be" to "we know that the result type will be X" as a change in the
result type.  That's really totally inaccurate.

I've never really understood errors about changing the result type.
As a user, I assumed those were unavoidable implementation artifacts,
on the theory that they were annoying and therefore the developers
would have eliminated such messages had it been practical.  As a
developer, I've never gotten around to understanding whether that
theory was correct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Peter van Hardenberg

Anecdotally, we just stored dates as strings and used a convention (key
ends in "_at", I believe) to interpret them. The lack of support for dates
in JSON is well-known, universally decried... and not a problem the
PostgreSQL community can fix.

On Thu, Mar 9, 2017 at 10:24 AM, Sven R. Kunze  wrote:

> On 09.03.2017 18:58, Robert Haas wrote:
>
>> Also, even if the superset thing were true on a theoretical plane, I'm
>> not sure it would do us much good in practice.  If we start using
>> YAML-specific constructs, we won't have valid JSON any more.  If we
>> use only things that are legal in JSON, YAML's irrelevant.
>>
>
> That's true. I just wanted to share my view of the "date guessing" part of
> pgpro's commits.
> I don't have a good solution for it either, I can only tell that where I
> work we do have same issues: either we guess by looking at the string value
> or we know that "this particular key" must be a date.
> Unsatisfied with either solution, we tend to use YAML for our APIs if
> possible.
>
>
> Regards,
> Sven
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt."—Kurt Vonnegut

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Kuntal Ghosh

Hello Amit,

On Tue, Mar 7, 2017 at 4:24 PM, Amit Langote
 wrote:
> Hi Kuntal,
>
> Patches apply and compile fine.  Works as advertised.
>
> Some minor comments on the patches themselves.
>
Thanks for the review.

> In 0001:
>
> - * pgstat_bestart() -
> + * pgstat_procstart() -
> + *
> + *  Initialize this process's entry in the PgBackendStatus array.
> + *  Called from InitPostgres and AuxiliaryProcessMain.
>
> Not being called from AuxiliaryProcessMain().  Maybe leftover comment from
> a previous version.  Actually I see that in patch 0002, Main() functions
> of various auxiliary processes call pgstat_procstart, not
> AuxiliaryProcessMain.
>
Fixed.

> + * user-defined functions which expects ids of backends starting from
> 1 to
>
> s/expects/expect/g
>
Fixed.

> +/*
> + * AuxiliaryPidGetProc -- get PGPROC for an auxiliary process
> + * given its PID
> + *
> + * Returns NULL if not found.
> + */
> +PGPROC *
> +AuxiliaryPidGetProc(int pid)
> +{
> +PGPROC *result;
>
> Initialize to NULL so that the comment above is true. :)
>
Fixed.

> In 0002:
>
> @@ -248,6 +248,9 @@ BackgroundWriterMain(void)
>   */
>  prev_hibernate = false;
>
> +/* report walwriter process in the PgBackendStatus array */
> +pgstat_procstart();
> +
>
> s/walwriter/writer/g
Fixed.

> Patch 0004 should update monitoring.sgml.
Added.

I've attached the updated patches. PFA.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Infra-to-expose-non-backend-processes-in-pg_stat_get.patch
Description: binary/octet-stream


0002-Expose-stats-for-auxiliary-processes-in-pg_stat_get_.patch
Description: binary/octet-stream


0003-Expose-stats-for-autovacuum-launcher-and-bgworker.patch
Description: binary/octet-stream


0004-Add-proc_type-column-in-pg_stat_get_activity.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-03-09 Thread Robert Haas

On Wed, Jan 25, 2017 at 12:44 AM, Craig Ringer  wrote:
>> The way that SetTransactionIdLimit() now works looks a bit dangerous.
>> xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the
>> passed-in oldestXid value and written straight into shared memory.
>> But the shared memory copy of oldestXid could have a different value.
>> I'm not sure if that breaks anything, but it certainly weakens any
>> confidence callers might have had that all those values are consistent
>> with each other.
>
> This was my main hesitation with the whole thing too.
>
> It's necessary to advance oldestXmin before we xlog the advance and
> truncate clog, and necessary to advance the vacuum limits only
> afterwards.

Well, that's why I tried to advocate that their should be two separate
XID limits in shared memory: leave what's there now the way it is, and
then add a new limit for "don't try to look up XIDs before this point:
splat".  I still think that'd be less risky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add pg_disable_checksums() and supporting infrastructure

2017-03-09 Thread Robert Haas

On Sun, Feb 19, 2017 at 12:02 PM, David Christensen  wrote:
> Hi Robert, this is part of a larger patch which *does* enable the checksums 
> online; I’ve been extracting the necessary pieces out with the understanding 
> that some people thought the disable code might be useful in its own merit.  
> I can add documentation for the various states.  The CHECKSUM_REVALIDATE is 
> the only one I feel is a little wibbly-wobbly; the other states are required 
> because of the fact that enabling checksums requires distinguishing between 
> “in process of enabling” and “everything is enabled”.

Ah, sorry, I had missed that patch.

> My design notes for the patch were submitted to the list with little comment; 
> see: 
> https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>
> I have since added the WAL logging of checksum states, however I’d be glad to 
> take feedback on the other proposed approaches (particularly the system 
> catalog changes + the concept of a checksum cycle).]

I think the concept of a checksum cycle might be overkill.  It would
be useful if we ever wanted to change the checksum algorithm, but I
don't see any particular reason why we need to build infrastructure
for that.  If we wanted to change the checksum algorithm, I think it
likely that we'd do that in the course of, say, widening it to 4 bytes
as part of a bigger change in the page format, and this infrastructure
would be too narrow to help with that.

While I'm glad to see work finally going on to allow enabling and
disabling checksums, I think it's too late to put this in v10.  We
have a rough rule that large new patches shouldn't be appearing for
the first time in the last CommitFest, and I think this falls into
that category.  I also think it would be unwise to commit just the
bits of the infrastructure that let us disable checksums without
having time for a thorough review of the whole thing; to me, the
chances that we'll make decisions that we later regret seem fairly
high.  I would rather wait for v11 and have a little more time to try
to get everything right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-03-09 Thread Sven R. Kunze


On 09.03.2017 19:50, Peter van Hardenberg wrote:
Anecdotally, we just stored dates as strings and used a convention 
(key ends in "_at", I believe) to interpret them. The lack of support 
for dates in JSON is well-known, universally decried... and not a 
problem the PostgreSQL community can fix.


I completely agree here.

Regards,
Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-09 Thread Peter Eisentraut

This is looking pretty neat.  I played around with it a bit.  There are
a couple of edge cases that you need to address, I think.

- Does not support \x

- When \pset format is rst, then \pset linestyle also shows up as
  "rst".  That is wrong.  Same for markdown.

- Broken output in tuples_only (\t) mode. (rst and markdown)

- rst: Do something about \pset title; the way it currently shows up
  appears to be invalid; could use ".. table:: title" directive

- markdown: Extra blank line between table and footer.

- markdown: We should document or comment somewhere exactly which of the
  various markdown table formats this is supposed to produce.  (Pandoc
  pipe_tables?)

- markdown: Table title needs to be after the table, like

Table: title

- markdown: Needs to escape | characters in cell contents.  (Not
  needed for rst.)  More escaping might be needed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bytea_output vs make installcheck

2017-03-09 Thread Peter Eisentraut

On 2/14/17 16:50, Jeff Janes wrote:
> make installcheck currently fails against a server running
> with bytea_output = escape.
> 
> Making it succeed is fairly easy, and I think it is worth doing.
> 
> Attached are two options for doing that.  One overrides bytea_output
> locally where-ever needed, and the other overrides it for the entire
> 'regression' database.

I would use option 2 here (ALTER DATABASE) and be done with it.  Some
people didn't like using ALTER DATABASE, but it's consistent with
existing use.  If someone wants to change that, that can be independent
of this issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management

2017-03-09 Thread Peter Geoghegan

On Wed, Mar 8, 2017 at 5:23 PM, Robert Haas  wrote:
> I think something like 0007-hj-shared-buf-file-v6.patch from
> https://www.postgresql.org/message-id/CAEepm=3g=dMG+84083fkFzLvgMJ7HdhbGB=aezabnukbzm3...@mail.gmail.com
> is probably a good approach to this problem.  In essence, it dodges
> the problem of trying to transfer ownership by making ownership be
> common from the beginning.  That's what I've been recommending, or
> trying to, anyway.

I think that I'm already doing that, to at least the same extent that
0007-hj-shared-buf-file-v6.patch is.

That patch seems to be solving the problem by completely taking over
management of temp files from fd.c. That is, these temp files are not
marked as temp files in the way ordinary temp BufFiles are, with
explicit buy-in from resowner.c about their temp-ness. It adds
"#include "catalog/pg_tablespace.h" to buffile.c, even though that
kind of thing generally lives within fd.c. It does use
PG_TEMP_FILE_PREFIX, but never manages to use temp_tablespaces if
that's set by user. It also doesn't do anything about temp_file_limit
enforcement.

Quite a lot of thought seems to have gone into making the
fd.c/resowner mechanism leak-proof in the face of errors. So, quite
apart from what that approaches misses out on, I really don't want to
change resource management too much. As I went into in my "recap", it
seems useful to change as little as possible about temp files.
Besides, because parallel hash join cannot know the size of the
BufFiles from workers in advance, and thus cannot replicate fd.c fully
by having state for each segment, it ends up actually *relying on*
ENOENT-on-unlink() as a condition that terminates execution:

> +bool
> +BufFileDeleteShared(Oid tablespace, pid_t pid, int set, int partition,
> +   int participant)
> +{
> +   chartempdirpath[MAXPGPATH];
> +   chartempfilepath[MAXPGPATH];
> +   int segment = 0;
> +   boolfound = false;
> +
> +   /*
> +* We don't know if the BufFile really exists, because SharedBufFile
> +* tracks only the range of file numbers.  If it does exists, we don't
> +* khow many 1GB segments it has, so we'll delete until we hit ENOENT or
> +* an IO error.
> +*/
> +   for (;;)
> +   {
> +   make_shareable_path(tempdirpath, tempfilepath,
> +   tablespace, pid, set, partition,
> +   participant, segment);
> +   if (!PathNameDelete(tempfilepath))
> +   break;
> +   found = true;
> +   ++segment;
> +   }
> +
> +   return found;
> +}

However, the whole point of my fortifying the refcount mechanism for
V9 of parallel tuplesort is to not have to ignore a ENOENT like this,
on the grounds that that's ugly/weird (I pointed this out when I
posted my V8, actually). Obviously I could very easily teach fd.c not
to LOG a complaint about an ENOENT on unlink() for the relevant
parallel case, on the assumption that it's only down to an error
during the brief period of co-ownership of a Buffile at the start of
leader unification. Is that acceptable?

Anyway, the only advantage I immediately see with the approach
0007-hj-shared-buf-file-v6.patch takes (over what I've provisionally
written as my V9) is that by putting everything in shared memory all
along, there is no weirdness with tying local memory clean-up to a
shared memory on_dsm_detach() callback. As I said, stashing a local
pointer for the leader in shared memory is weird, even if it
accomplishes the stated goals for V9.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue after upgrading from 9.4 to 9.6

2017-03-09 Thread Andres Freund

Hi,

On 2017-03-09 13:47:35 +0100, Naytro Naytro wrote:
> We are having some performance issues after we upgraded to newest
> version of PostgreSQL, before it everything was fast and smooth.
> 
> Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> upgraded to 9.6.2 with no improvement.
> 
> Some information about our setup: Freebsd, Solaris (SmartOS), simple
> master-slave using streaming replication.

Which node is on which of those, and where is the high load?

> Problem:
> Very high system CPU when master is streaming replication data, CPU
> goes up to 77%. Only one process is generating this load, it's a
> postgresql startup process. When I attached a truss to this process I
> saw a lot o read calls with almost the same number of errors (EAGAIN).

Hm. Just to clarify: The load is on the *receiving* side, in the startup
process?  Because the load doesn't quite look that way...

> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> 
> Descriptor 6 is a pipe

That's presumably a latches internal pipe.  Could you redo that
truss/strace with timestamps attached?  Does truss show signals
received? The above profile would e.g. make a lot more sense if not.  Is
the wal receiver sending signals?

> Read call try to read one byte over and over, I looked up to source
> code and I think this file is responsible for this behavior
> src/backend/storage/ipc/latch.c. There was no such file in 9.4.

It was "just" moved (and expanded), used to be at
src/backend/port/unix_latch.c.

There normally shouldn't be that much "latch traffic" in the startup
process, we'd expect to block from within WaitForWALToBecomeAvailable().

Hm.  Any chance you've configured a recovery_min_apply_delay?  Although
I'd expect more timestamp calls in that case.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] tzdata2017a breaks timestamptz regression test

2017-03-09 Thread Tom Lane

I typically build with --with-system-tzdata, which means that any time
Red Hat sees fit to push out a new copy of the tzdata package, that's
what I'm testing against.  This morning they updated to tzdata2017a,
and I'm seeing the attached test failures.

The reason for this is that the IANA crew have really moved forward
aggressively on their project to remove invented zone abbreviations.
As stated at
http://mm.icann.org/pipermail/tz-announce/2017-February/45.html
they've removed text abbreviations for pretty much all of South
America, which is what's breaking these cases.

For the cases involving the America/Santiago zone, I'm a bit inclined
to just switch that to America/New_York, which seems much less likely
to get fooled with by IANA.  But I'm wondering if Alvaro had a specific
reason for using the Santiago zone in those test cases.

The diffs involving VET (America/Caracas) are more problematic: we're
intentionally trying to test a time-varying zone abbreviation, so we can't
just switch to a stable zone.  As far as the lines 2395..2437 hunk goes,
we could perhaps make that segment of the test print in ISO datestyle
instead of Postgres datestyle, which would force the zone to be printed in
numeric form not as an abbreviation.  That's slightly annoying, but it
doesn't really compromise what that part of the test is trying to test.
However, the very last hunk indicates that VET no longer functions as a
time-varying zone abbreviation at all, because it doesn't match any
abbreviation recorded for America/Caracas in the IANA database, so we fall
back to treating it as a simple synonym for America/Caracas (cf commit
39b691f25).  So that's now failing altogether to test what it means to.

What I propose we do about that is replace the America/Caracas test cases
with Europe/Moscow tests, moving the dates as needed to match DST
transitions from when Moscow was observing DST (pre 2011).  The comments
in the IANA files indicate that they believe the MSK/MSD abbreviations
have or had real-world usage, so they probably won't replace them with
numeric offsets.  We can hope, anyway.

regards, tom lane

*** /home/postgres/pgsql/src/test/regress/expected/timestamptz.out	Mon Jan 30 17:03:46 2017
--- /home/postgres/pgsql/src/test/regress/results/timestamptz.out	Thu Mar  9 13:35:41 2017
***
*** 1778,1796 
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33);
  make_timestamptz 
  -
!  Sun Jul 15 08:15:55.33 1973 CLT
  (1 row)
  
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33, '+2');
  make_timestamptz 
  -
!  Sun Jul 15 02:15:55.33 1973 CLT
  (1 row)
  
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33, '-2');
  make_timestamptz 
  -
!  Sun Jul 15 06:15:55.33 1973 CLT
  (1 row)
  
  WITH tzs (tz) AS (VALUES
--- 1778,1796 
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33);
  make_timestamptz 
  -
!  Sun Jul 15 08:15:55.33 1973 -04
  (1 row)
  
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33, '+2');
  make_timestamptz 
  -
!  Sun Jul 15 02:15:55.33 1973 -04
  (1 row)
  
  SELECT make_timestamptz(1973, 07, 15, 08, 15, 55.33, '-2');
  make_timestamptz 
  -
!  Sun Jul 15 06:15:55.33 1973 -04
  (1 row)
  
  WITH tzs (tz) AS (VALUES
***
*** 1799,1821 
  ('+10:00:1'), ('+10:00:01'),
  ('+10:00:10'))
   SELECT make_timestamptz(2010, 2, 27, 3, 45, 00, tz), tz FROM tzs;
!make_timestamptz|tz 
! ---+---
!  Fri Feb 26 23:45:00 2010 CLST | +1
!  Fri Feb 26 23:45:00 2010 CLST | +1:
!  Fri Feb 26 23:45:00 2010 CLST | +1:0
!  Fri Feb 26 23:45:00 2010 CLST | +100
!  Fri Feb 26 23:45:00 2010 CLST | +1:00
!  Fri Feb 26 23:45:00 2010 CLST | +01:00
!  Fri Feb 26 14:45:00 2010 CLST | +10
!  Fri Feb 26 14:45:00 2010 CLST | +1000
!  Fri Feb 26 14:45:00 2010 CLST | +10:
!  Fri Feb 26 14:45:00 2010 CLST | +10:0
!  Fri Feb 26 14:45:00 2010 CLST | +10:00
!  Fri Feb 26 14:45:00 2010 CLST | +10:00:
!  Fri Feb 26 14:44:59 2010 CLST | +10:00:1
!  Fri Feb 26 14:44:59 2010 CLST | +10:00:01
!  Fri Feb 26 14:44:50 2010 CLST | +10:00:10
  (15 rows)
  
  -- these should fail
--- 1799,1821 
  ('+10:00:1'), ('+10:00:01'),
  ('+10:00:10'))
   SELECT make_timestamptz(2010, 2, 27, 3, 45, 00, tz), tz FROM tzs;
!make_timestamptz   |tz 
! --+---
!  Fri Feb 26 23:45:00 2010 -03 | +1
!  Fri Feb 26 23:45:00 2010 -03 | +1:
!  Fri Feb 26 23:45:00 2010 -03 | +1:0
!  Fri Feb 26 23:45:00 2010 -03 | +100
!  Fri Feb 26 23:45:00 2010 -03 | +1:00
!  Fri Feb 26 23:45:00 2010 -03 | +01:00
!  Fri Feb 26 14:45:00 2010 -03 | +10
!  Fri Feb 26 14:45:00 2010 -03 | +1000
!  Fri F

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Peter Eisentraut

Perhaps I'm confused by the title of this thread/CF entry, but
background workers already do show up in pg_stat_activity.  (You can
verify that by testing the background sessions patch.)  So which
additional things are we aiming to see with this?

In practice, I think it's common to do a quick select * from
pg_stat_activity to determine whether a database instance is in use.
(You always see your own session, but that's easy to eyeball.)  If we
add all the various background processes by default, that will make
things harder, especially if there is no straightforward way to filter
them out.

Perhaps a pg_stat_user_* and pg_stat_system_* split like we have for
some of the statistics tables would be useful?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Documentation improvements for partitioning

2017-03-09 Thread Tom Lane

Robert Haas  writes:
> This is about list-ifying a note, but I think we should try to
> de-note-ify it.  It's a giant block of text that is not obviously more
> noteworthy than the surrounding text; I think  should be saved
> for things that particularly need to be called out.

Yeah.  A big problem with the markup we use, imo, is that  segments
are displayed in a way that makes them more prominent than the surrounding
text, not less so.  That doesn't really square with my intuitive view of
what a  ought to be used for; it forces it to be considered as
something only slightly less dangerous than a  or , not
as a parenthetical remark.  But that's what we have to deal with so
we should mark up our text accordingly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-03-09 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jan 30, 2017 at 6:54 PM, Tom Lane  wrote:
>> If you don't want to make ExecInitExpr responsible, then the planner would
>> have to do something like split_pathtarget_at_srf anyway to decompose the
>> expressions, no matter which executor representation we use.

> Did we do anything about this?  Are we going to?

No, and I think we should.  Is it on the v10 open items list?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-03-09 Thread Stephen Frost

Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > On Mon, Jan 30, 2017 at 6:54 PM, Tom Lane  wrote:
> >> If you don't want to make ExecInitExpr responsible, then the planner would
> >> have to do something like split_pathtarget_at_srf anyway to decompose the
> >> expressions, no matter which executor representation we use.
> 
> > Did we do anything about this?  Are we going to?
> 
> No, and I think we should.  Is it on the v10 open items list?

Wasn't, I've added it now:

https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] adding an immutable variant of to_date

2017-03-09 Thread Sven R. Kunze


Thanks!

On 08.03.2017 17:37, Andreas Karlsson wrote:
The current code for to_char will on the first call to to_char build 
arrays with the localized names of the week days and the months. I 
suspect that you may need to build something similar but a set of 
arrays per locale.


Not sure what the most C-like way of solving this is. Coming from a 
Python background, I would tend to use a dict. The keys are the locales, 
the values are the arrays.


I would use something which is described here: 
http://stackoverflow.com/questions/3269881/how-to-represent-a-python-like-dictionary-in-c 
. Preferred libc-related, like hcreate and hsearch.



See the DCH_to_char function and its call to cache_locale_time.


Alright. Found the cache here:

 src/backend/utils/adt/pg_locale.c
/* lc_time localization cache */
char   *localized_abbrev_days[7];
char   *localized_full_days[7];
char   *localized_abbrev_months[12];
char   *localized_full_months[12];

/*  */

void
cache_locale_time(void)
{
char   *save_lc_time;
time_t  timenow;
struct tm  *timeinfo;
int i;

#ifdef WIN32
char   *save_lc_ctype;
#endif

/* did we do this already? */
if (CurrentLCTimeValid)
return;

I would replace the invalidation check with a hsearch lookup.



Grepping for the cache variables, I found:

~/src/postgres$ grep -r localized_abbrev_days *
src/backend/utils/adt/pg_locale.c:char *localized_abbrev_days[7];
src/backend/utils/adt/pg_locale.c: 
cache_single_time(&localized_abbrev_days[i], "%a", timeinfo);
src/backend/utils/adt/formatting.c:char *str = 
str_toupper_z(localized_abbrev_days[tm->tm_wday], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_initcap_z(localized_abbrev_days[tm->tm_wday], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_tolower_z(localized_abbrev_days[tm->tm_wday], collid);

src/include/utils/pg_locale.h:extern char *localized_abbrev_days[];

~/src/postgres$ grep -r localized_full_days *
src/backend/utils/adt/pg_locale.c:char *localized_full_days[7];
src/backend/utils/adt/pg_locale.c: 
cache_single_time(&localized_full_days[i], "%A", timeinfo);
src/backend/utils/adt/formatting.c:char *str = 
str_toupper_z(localized_full_days[tm->tm_wday], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_initcap_z(localized_full_days[tm->tm_wday], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_tolower_z(localized_full_days[tm->tm_wday], collid);

src/include/utils/pg_locale.h:extern char *localized_full_days[];

~/src/postgres$ grep -r localized_abbrev_months *
src/backend/utils/adt/pg_locale.c:char *localized_abbrev_months[12];
src/backend/utils/adt/pg_locale.c: 
cache_single_time(&localized_abbrev_months[i], "%b", timeinfo);
src/backend/utils/adt/formatting.c:char *str = 
str_toupper_z(localized_abbrev_months[tm->tm_mon - 1], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_initcap_z(localized_abbrev_months[tm->tm_mon - 1], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_tolower_z(localized_abbrev_months[tm->tm_mon - 1], collid);

src/include/utils/pg_locale.h:extern char *localized_abbrev_months[];

srkunze@mexico:~/src/postgres$ grep -r localized_full_months *
src/backend/utils/adt/pg_locale.c:char *localized_full_months[12];
src/backend/utils/adt/pg_locale.c: 
cache_single_time(&localized_full_months[i], "%B", timeinfo);
src/backend/utils/adt/formatting.c:char *str = 
str_toupper_z(localized_full_months[tm->tm_mon - 1], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_initcap_z(localized_full_months[tm->tm_mon - 1], collid);
src/backend/utils/adt/formatting.c:char *str = 
str_tolower_z(localized_full_months[tm->tm_mon - 1], collid);

src/include/utils/pg_locale.h:extern char *localized_full_months[];


It seems to me, that I would need to make them point to the correct 
array address at the beginning of those functions by doing an hsearch.


That step is basically independent of the actual immutable to_date idea.


What do you think?

Best,
Sven



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] adding an immutable variant of to_date

2017-03-09 Thread Sven R. Kunze


On 08.03.2017 03:37, Robert Haas wrote:

The commit message for 64353640e87b54250f1b8a1d2a708d270dff4bfd has
some interesting perspective on this.


"""
Also, mark to_date() and to_char(interval) as stable; although these appear
not to depend on any GUC variables as of CVS HEAD, that seems a property
unlikely to survive future improvements.  It seems best to mark all the
formatting functions stable and be done with it.
"""

My take away from this commit is the following:

Historically, the immutability of "to_date(text, text)" was an emergent 
feature. Proven to be possibly mutable, the parsing feature had a higher 
priority, so the immutability needed to be removed.


The proposed variant on the other hand should be immutable first before 
everything else. Thus, future improvements cannot violate that. This 
might restrict the possible parsing functionality but that shouldn't be 
a problem in practice.


Regards,
Sven


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Upgrading postmaster's log messages about bind/listen errors

2017-03-09 Thread Tom Lane

Over in
https://www.postgresql.org/message-id/flat/201703072317.01345.john.iliffe%40iliffe.ca
we spent quite a lot of effort to diagnose what turned out to be a simple
networking misconfiguration.  It would probably have taken a lot less
effort if the postmaster were more forthcoming about exactly what address
it's trying to bind to.  I seem to recall having wanted to include that
info in the messages many years ago, but at the time we lacked any
reasonably-portable way to decode a struct addrinfo.  Now we have
pg_getnameinfo_all(), so PFA a patch to include the specific address in
any complaint about failures in the socket/bind/listen sequence.

For good measure I also added a DEBUG1 log message reporting successful
binding to a port.  I'm not sure if there's an argument for putting this
out at LOG level (i.e. by default) --- any thoughts about that?

There are probably a couple of example messages in the SGML docs that
would need to be updated, but I've not trawled for them yet.

regards, tom lane

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 7939b1f..3f3b2f2 100644
*** a/src/backend/libpq/pqcomm.c
--- b/src/backend/libpq/pqcomm.c
*** StreamServerPort(int family, char *hostN
*** 319,324 
--- 319,326 
  	char		portNumberStr[32];
  	const char *familyDesc;
  	char		familyDescBuf[64];
+ 	const char *addrDesc;
+ 	char		addrBuf[NI_MAXHOST];
  	char	   *service;
  	struct addrinfo *addrs = NULL,
  			   *addr;
*** StreamServerPort(int family, char *hostN
*** 407,413 
  			break;
  		}
  
! 		/* set up family name for possible error messages */
  		switch (addr->ai_family)
  		{
  			case AF_INET:
--- 409,415 
  			break;
  		}
  
! 		/* set up address family name for log messages */
  		switch (addr->ai_family)
  		{
  			case AF_INET:
*** StreamServerPort(int family, char *hostN
*** 431,443 
  break;
  		}
  
  		if ((fd = socket(addr->ai_family, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
  		{
  			ereport(LOG,
  	(errcode_for_socket_access(),
! 			/* translator: %s is IPv4, IPv6, or Unix */
! 	 errmsg("could not create %s socket: %m",
! 			familyDesc)));
  			continue;
  		}
  
--- 433,460 
  break;
  		}
  
+ 		/* set up text form of address for log messages */
+ #ifdef HAVE_UNIX_SOCKETS
+ 		if (addr->ai_family == AF_UNIX)
+ 			addrDesc = unixSocketPath;
+ 		else
+ #endif
+ 		{
+ 			pg_getnameinfo_all((const struct sockaddr_storage *) addr->ai_addr,
+ 			   addr->ai_addrlen,
+ 			   addrBuf, sizeof(addrBuf),
+ 			   NULL, 0,
+ 			   NI_NUMERICHOST);
+ 			addrDesc = addrBuf;
+ 		}
+ 
  		if ((fd = socket(addr->ai_family, SOCK_STREAM, 0)) == PGINVALID_SOCKET)
  		{
  			ereport(LOG,
  	(errcode_for_socket_access(),
! 			/* translator: first %s is IPv4, IPv6, or Unix */
!   errmsg("could not create %s socket for address \"%s\": %m",
! 		 familyDesc, addrDesc)));
  			continue;
  		}
  
*** StreamServerPort(int family, char *hostN
*** 461,467 
  			{
  ereport(LOG,
  		(errcode_for_socket_access(),
! 		 errmsg("setsockopt(SO_REUSEADDR) failed: %m")));
  closesocket(fd);
  continue;
  			}
--- 478,486 
  			{
  ereport(LOG,
  		(errcode_for_socket_access(),
! /* translator: first %s is IPv4, IPv6, or Unix */
! 		 errmsg("setsockopt(SO_REUSEADDR) failed for %s address \"%s\": %m",
! familyDesc, addrDesc)));
  closesocket(fd);
  continue;
  			}
*** StreamServerPort(int family, char *hostN
*** 476,482 
  			{
  ereport(LOG,
  		(errcode_for_socket_access(),
! 		 errmsg("setsockopt(IPV6_V6ONLY) failed: %m")));
  closesocket(fd);
  continue;
  			}
--- 495,503 
  			{
  ereport(LOG,
  		(errcode_for_socket_access(),
! /* translator: first %s is IPv4, IPv6, or Unix */
! 		 errmsg("setsockopt(IPV6_V6ONLY) failed for %s address \"%s\": %m",
! familyDesc, addrDesc)));
  closesocket(fd);
  continue;
  			}
*** StreamServerPort(int family, char *hostN
*** 494,502 
  		{
  			ereport(LOG,
  	(errcode_for_socket_access(),
! 			/* translator: %s is IPv4, IPv6, or Unix */
! 	 errmsg("could not bind %s socket: %m",
! 			familyDesc),
  	 (IS_AF_UNIX(addr->ai_family)) ?
    errhint("Is another postmaster already running on port %d?"
  		  " If not, remove socket file \"%s\" and retry.",
--- 515,523 
  		{
  			ereport(LOG,
  	(errcode_for_socket_access(),
! 			/* translator: first %s is IPv4, IPv6, or Unix */
! 	 errmsg("could not bind %s address \"%s\": %m",
! 			familyDesc, addrDesc),
  	 (IS_AF_UNIX(addr->ai_family)) ?
    errhint("Is another postmaster already running on port %d?"
  		  " If not, remove socket file \"%s\" and retry.",
*** StreamServerPort(int family, char *hostN
*** 533,544 
  		{
  			ereport(LOG,
  	(errcode_for_socket_acce

Re: [HACKERS] cast result of copyNode()

2017-03-09 Thread Peter Eisentraut

On 3/7/17 18:27, Mark Dilger wrote:
> You appear to be using a #define macro to wrap a function of the same name
> with the code:
> 
> #define copyObject(obj) ((typeof(obj)) copyObject(obj))

Yeah, that's a bit silly.  Here is an updated version that changes that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 696e605c358e7ca5786a13c82504e9d7dbed5eb4 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 9 Mar 2017 15:18:59 -0500
Subject: [PATCH v2] Cast result of copyObject() to correct type

copyObject() is declared to return void *, which allows easily assigning
the result independent of the input, but it loses all type checking.

If the compiler supports typeof or something similar, cast the result to
the input type.  This creates a greater amount of type safety.  In some
cases, where the result is assigned to a generic type such as Node * or
Expr *, new casts are now necessary, but in general casts are now
unnecessary in the normal case and indicate that something unusual is
happening.
---
 config/c-compiler.m4  | 27 +
 configure | 40 +++
 configure.in  |  1 +
 src/backend/bootstrap/bootstrap.c |  4 ++--
 src/backend/commands/copy.c   |  2 +-
 src/backend/commands/createas.c   |  2 +-
 src/backend/commands/event_trigger.c  |  8 +++
 src/backend/commands/prepare.c|  4 ++--
 src/backend/commands/view.c   |  2 +-
 src/backend/nodes/copyfuncs.c |  8 +++
 src/backend/optimizer/path/indxpath.c |  4 ++--
 src/backend/optimizer/plan/createplan.c   |  6 ++---
 src/backend/optimizer/plan/initsplan.c|  8 +++
 src/backend/optimizer/plan/planagg.c  |  4 ++--
 src/backend/optimizer/plan/planner.c  |  4 ++--
 src/backend/optimizer/plan/setrefs.c  | 26 ++--
 src/backend/optimizer/plan/subselect.c| 14 +--
 src/backend/optimizer/prep/prepjointree.c |  4 ++--
 src/backend/optimizer/prep/preptlist.c|  2 +-
 src/backend/optimizer/prep/prepunion.c|  4 ++--
 src/backend/optimizer/util/tlist.c| 12 +-
 src/backend/parser/analyze.c  |  2 +-
 src/backend/parser/gram.y |  2 +-
 src/backend/parser/parse_clause.c |  2 +-
 src/backend/parser/parse_expr.c   |  2 +-
 src/backend/parser/parse_relation.c   |  2 +-
 src/backend/parser/parse_utilcmd.c|  6 ++---
 src/backend/rewrite/rewriteHandler.c  |  8 +++
 src/backend/rewrite/rewriteManip.c|  8 +++
 src/backend/tcop/postgres.c   |  6 ++---
 src/backend/utils/cache/plancache.c   | 14 +--
 src/backend/utils/cache/relcache.c|  8 +++
 src/include/nodes/nodes.h |  8 ++-
 src/include/optimizer/tlist.h |  4 ++--
 src/include/pg_config.h.in|  6 +
 src/include/pg_config.h.win32 |  6 +
 36 files changed, 178 insertions(+), 92 deletions(-)

diff --git a/config/c-compiler.m4 b/config/c-compiler.m4
index 7d901e1f1a..7afaec5f85 100644
--- a/config/c-compiler.m4
+++ b/config/c-compiler.m4
@@ -178,6 +178,33 @@ fi])# PGAC_C_STATIC_ASSERT
 
 
 
+# PGAC_C_TYPEOF
+# -
+# Check if the C compiler understands typeof or a variant.  Define
+# HAVE_TYPEOF if so, and define 'typeof' to the actual key word.
+#
+AC_DEFUN([PGAC_C_TYPEOF],
+[AC_CACHE_CHECK(for typeof, pgac_cv_c_typeof,
+[pgac_cv_c_typeof=no
+for pgac_kw in typeof __typeof__ decltype; do
+  AC_COMPILE_IFELSE([AC_LANG_PROGRAM([],
+[int x = 0;
+$pgac_kw(x) y;
+y = x;
+return y;])],
+[pgac_cv_c_typeof=$pgac_kw])
+  test "$pgac_cv_c_typeof" != no && break
+done])
+if test "$pgac_cv_c_typeof" != no; then
+  AC_DEFINE(HAVE_TYPEOF, 1,
+[Define to 1 if your compiler understands `typeof' or something similar.])
+  if test "$pgac_cv_c_typeof" != typeof; then
+AC_DEFINE(typeof, $pgac_cv_c_typeof, [Define to how the compiler spells `typeof'.])
+  fi
+fi])# PGAC_C_TYPEOF
+
+
+
 # PGAC_C_TYPES_COMPATIBLE
 # ---
 # Check if the C compiler understands __builtin_types_compatible_p,
diff --git a/configure b/configure
index b5cdebb510..47a1a59e8e 100755
--- a/configure
+++ b/configure
@@ -11399,6 +11399,46 @@ if test x"$pgac_cv__static_assert" = xyes ; then
 $as_echo "#define HAVE__STATIC_ASSERT 1" >>confdefs.h
 
 fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for typeof" >&5
+$as_echo_n "checking for typeof... " >&6; }
+if ${pgac_cv_c_typeof+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_cv_c_typeof=no
+for pgac_kw in typeof __typeof__ decltype; do
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+int x = 0;
+$pgac_kw(x) y;
+y = x;
+return y;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+

Re: [HACKERS] exposing wait events for non-backends (was: Tracking wait event for latches)

2017-03-09 Thread Robert Haas

On Thu, Mar 9, 2017 at 2:30 PM, Peter Eisentraut
 wrote:
> Perhaps I'm confused by the title of this thread/CF entry, but
> background workers already do show up in pg_stat_activity.  (You can
> verify that by testing the background sessions patch.)  So which
> additional things are we aiming to see with this?

All the processes that don't normally show up in pg_stat_activity,
such as auxiliary processes.

> In practice, I think it's common to do a quick select * from
> pg_stat_activity to determine whether a database instance is in use.
> (You always see your own session, but that's easy to eyeball.)  If we
> add all the various background processes by default, that will make
> things harder, especially if there is no straightforward way to filter
> them out.
>
> Perhaps a pg_stat_user_* and pg_stat_system_* split like we have for
> some of the statistics tables would be useful?

I thought of the same kind of thing, and it was discussed upthread.
There seemed to be more votes for keeping it all in one view, but that
could change if more people vote.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 >

1 - 100 of 211 matches

Mail list logo