Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-27 Thread Andres Freund
On 2017-03-23 20:35:09 +1300, Thomas Munro wrote:
> Here is a new patch series responding to feedback from Peter and Andres:

Here's a review of 0007 & 0010 together - they're going to have to be
applied together anyway...


diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index ac339fb566..775c9126c7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3814,6 +3814,21 @@ ANY num_sync ( 
+  cpu_shared_tuple_cost (floating 
point)
+  
+   cpu_shared_tuple_cost configuration 
parameter
+  
+  
+  
+   
+Sets the planner's estimate of the cost of sharing rows in
+memory during a parallel query.
+The default is 0.001.
+   
+  
+ 
+

Isn't that really low in comparison to the other costs? I think
specifying a bit more what this actually measures would be good too - is
it putting the tuple in shared memory? Is it accessing it?


+ 
+  cpu_synchronization_cost (floating 
point)
+  
+   cpu_synchronization_cost configuration 
parameter
+  
+  
+  
+   
+Sets the planner's estimate of the cost of waiting at synchronization
+points for other processes while executing parallel queries.
+The default is 1.0.
+   
+  
+ 

Isn't this also really cheap in comparison to a, probably cached, seq
page read?


+   if (HashJoinTableIsShared(hashtable))
+   {
+   /*
+* Synchronize parallel hash table builds.  At this stage we 
know that
+* the shared hash table has been created, but we don't know if 
our
+* peers are still in MultiExecHash and if so how far through.  
We use
+* the phase to synchronize with them.
+*/
+   barrier = >shared->barrier;
+
+   switch (BarrierPhase(barrier))
+   {
+   case PHJ_PHASE_BEGINNING:

Note pgindent will indent this further.  Might be worthwhile to try to
pgindent the file, revert some of the unintended damage.

/*
 * set expression context
 */

I'd still like this to be moved to the start.


@@ -126,17 +202,79 @@ MultiExecHash(HashState *node)
/* Not subject to skew optimization, so insert 
normally */
ExecHashTableInsert(hashtable, slot, hashvalue);
}
-   hashtable->totalTuples += 1;
+   hashtable->partialTuples += 1;
+   if (!HashJoinTableIsShared(hashtable))
+   hashtable->totalTuples += 1;
}
}

FWIW, I'd put HashJoinTableIsShared() into a local var - the compiler
won't be able to do that on its own because external function calls
could invalidate the result.

That brings me to a related topic: Have you measured whether your
changes cause performance differences?


+   finish_loading(hashtable);

I find the sudden switch to a different naming scheme in the same file a
bit jarring.


+   if (HashJoinTableIsShared(hashtable))
+   BarrierDetach(>shared->shrink_barrier);
+
+   if (HashJoinTableIsShared(hashtable))
+   {

Consecutive if blocks with the same condition...


+   bool elected_to_resize;
+
+   /*
+* Wait for all backends to finish building.  If only one 
worker is
+* running the building phase because of a non-partial inner 
plan, the
+* other workers will pile up here waiting.  If multiple worker 
are
+* building, they should finish close to each other in time.
+*/

That comment is outdated, isn't it?

 
/* resize the hash table if needed (NTUP_PER_BUCKET exceeded) */
-   if (hashtable->nbuckets != hashtable->nbuckets_optimal)
-   ExecHashIncreaseNumBuckets(hashtable);
+   ExecHashUpdate(hashtable);
+   ExecHashIncreaseNumBuckets(hashtable);

So this now doesn't actually increase the number of buckets anymore.

+ reinsert:
+   /* If the table was resized, insert tuples into the new buckets. */
+   ExecHashUpdate(hashtable);
+   ExecHashReinsertAll(hashtable);

ReinsertAll just happens to do nothing if we didn't have to
resize... Not entirely obvious, sure reads as if it were unconditional.
Also, it's not actually "All" when batching is in use, no?


+ post_resize:
+   if (HashJoinTableIsShared(hashtable))
+   {
+   Assert(BarrierPhase(barrier) == PHJ_PHASE_RESIZING);
+   BarrierWait(barrier, WAIT_EVENT_HASH_RESIZING);
+   Assert(BarrierPhase(barrier) == PHJ_PHASE_REINSERTING);
+   }
+
+ reinsert:
+   /* If the table was resized, insert tuples into the new buckets. */
+   ExecHashUpdate(hashtable);
+   ExecHashReinsertAll(hashtable);

Hm.  So even non-resizing backends reach this - but they happen to not
do anything 

Re: [HACKERS] Speedup twophase transactions

2017-03-27 Thread Nikhil Sontakke
Please ignore reports about errors in other tests. Seem spurious..

Regards,
Nikhils

On 28 March 2017 at 10:40, Nikhil Sontakke  wrote:

> Hi Micheal,
>
> The latest patch looks good. By now doing a single scan of shmem two phase
> data, we have removed the double loops in all the affected functions which
> is good.
>
> My only question is if the added call to restoreTwoPhaseData() is good
> enough to handle all the 3 functions PrescanPreparedTransactions(),
> StandbyRecoverPreparedTransactions() and RecoverPreparedTransactions()
> appropriately? It looks as if it does, but we need to be doubly sure..
>
> PFA, revised patch with a very minor typo fix and rebase against latest
> master. The test cases pass as needed.
>
> Oh, btw, while running TAP tests, I got a few errors in unrelated tests.
>
> "# testing connection parameter "target_session_attrs"
>
> not ok 5 - connect to node master if mode "read-write" and
> master,standby_1 listed
>
>
> #   Failed test 'connect to node master if mode "read-write" and
> master,standby_1 listed'
>
> #   at t/001_stream_rep.pl line 93.
>
> #  got: ''
>
> # expected: '1'
>
> not ok 6 - connect to node master if mode "read-write" and
> standby_1,master listed
>
>
> #   Failed test 'connect to node master if mode "read-write" and
> standby_1,master listed'
>
> #   at t/001_stream_rep.pl line 93.
>
> #  got: ''
>
> # expected: '1'
>
> not ok 7 - connect to node master if mode "any" and master,standby_1 listed
>
>
> #   Failed test 'connect to node master if mode "any" and master,standby_1
> listed'
>
> #   at t/001_stream_rep.pl line 93.
>
> #  got: ''
>
> # expected: '1'
>
> not ok 8 - connect to node standby_1 if mode "any" and standby_1,master
> listed"
>
> Again, not related to this recovery code path, but not sure if others see
> this as well.
>
> Regards,
> Nikhils
>
> On 27 March 2017 at 05:35, Michael Paquier 
> wrote:
>
>> On Sun, Mar 26, 2017 at 4:50 PM, Nikhil Sontakke
>>  wrote:
>> > I was away for a bit. I will take a look at this patch and get back to
>> you
>> > soon.
>>
>> No problem. Thanks for your time!
>> --
>> Michael
>>
>
>
>
> --
>  Nikhil Sontakke   http://www.2ndQuadrant.com/
>  PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services
>



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tsunakawa, Takayuki
From: Michael Paquier [mailto:michael.paqu...@gmail.com]
> Okay. I got the message, and I agree with what you say here. You are right
> by the way, the error messages just use "two-phase file" and not "two-phase
> STATE file", missed that previously.
> --
Thanks, marked as ready for committer, as the code is good and Alexander 
reported the test success.

Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Mithun Cy
On Mon, Mar 27, 2017 at 11:21 AM, Amit Kapila  wrote:

> I think we can't change the number of buckets to be created or lowmask
> and highmask calculation here without modifying _h_spoolinit() because
> it sorts the data to be inserted based on hashkey which in turn
> depends on the number of buckets that we are going to create during
> create index operation.  We either need to allow create index
> operation to still always create buckets in power-of-two fashion or we
> need to update _h_spoolinit according to new computation.  One minor
> drawback of using power-of-two scheme for creation of buckets during
> create index is that it can lead to wastage of space and will be
> inconsistent with what the patch does during split operation.

Yes, this was a miss. Now Number of buckets allocated during
metap_init is not always a power-of-two number. The hashbuild which
uses just the hash_mask to decide which bucket does the hashkey
belong to is not sufficient. It can give buckets beyond max_buckets
and sorting of index values based on their buckets will be out of
order. When we try to actually insert the same in hash index we loose
the advantage of the spatial locality which existed before. And, hence
indexbuild performance can reduce.

As you have said we can solve it if we allocate buckets always in
power-of-2 when we do hash index meta page init. But on other
occasions, when we try to double the existing buckets we can do the
allocation in 4 equal phases.

But I think there are 2 more ways to solve same,

A. Why not pass all 3 parameters high_mask, low_mask, max-buckets to
tuplesort and let them use _hash_hashkey2bucket to figure out which
key belong to which bucket. And then sort them. I think this way we
make both sorting and insertion to hash index both consistent with
each other.

B. In tuple sort we can use hash function bucket = hash_key %
num_buckets instead of existing one which does bitwise "and" to
determine the bucket of hash key. This way we will not wrongly assign
buckets beyond max_buckets and sorted hash keys will be in sync with
actual insertion order of _hash_doinsert.


I am adding both the patches Patch_A and Patch_B. My preference is
Patch_A and I am open for suggestion.

>+#define SPLITPOINT_PHASES_PER_GRP 4
>+#define SPLITPOINT_PHASE_MASK (SPLITPOINT_PHASES_PER_GRP - 1)
>+#define Buckets_First_Split_Group 4
Fixed.

>In the above computation +2 and -2 still bothers me.  I think you need
>to do this because you have defined split group zero to have four
>buckets, how about if you don't force that and rather define to have
>split point phases only from split point which has four or more
>buckets?

Okay as suggested instead of group zero having 4 phases of 1 bucket
each, I have recalculated the spare mapping as below.
Allocating huge chunks of bucket pages all at once isn't optimal and
we will take ages to consume those. To avoid this exponential growth
of index size, we did use a trick to breakup allocation of buckets at
the splitpoint into 4 equal phases.  If (2 ^ x) is the total buckets
need to be allocated at a splitpoint (from now on we shall call this
as a splitpoint group), then we allocate 1/4th (2 ^ (x - 2)) of total
buckets at each phase of splitpoint group. Next quarter of allocation
will only happen if buckets of the previous phase have been already
consumed.  Since for buckets number < 4 we cannot further divide it
into multiple phases, the first 3 group will have only one phase of
allocation. The groups 0, 1, 2 will allocate 1, 1, 2 buckets
respectively at once in one phase. For the groups > 2 Where we
allocate buckets > 4, the allocation process is distributed among four
equal phases. At group 3 we allocate 4 buckets in 4 different phases
{1, 1, 1, 1}, the numbers in curly braces indicate number of buckets
allocated within each phase of splitpoint group 3. And, for splitpoint
group 4 and 5 allocation phase will be {2, 2, 2, 2} = 16 buckets in
total and {4, 4, 4, 4} = 32 buckets in total.  We can see that at each
splitpoint group
we double the total number of buckets from previous group but in an
incremental phase.  The bucket pages allocated within one phase of a
splitpoint group will appear consecutively in the index.


The sortbuild_hash_*.patch can be applied independently on any of
expand_hashbucket_efficiently_08.patch
-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


sortbuild_hash_B.patch
Description: Binary data


sortbuild_hash_A.patch
Description: Binary data


yet_another_expand_hashbucket_efficiently_08.patch
Description: Binary data


expand_hashbucket_efficiently_08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Michael Paquier
On Tue, Mar 28, 2017 at 1:33 PM, Tsunakawa, Takayuki
 wrote:
> I get the impression that DATA_CORRUPTED means the table data is corrupted, 
> because there's an error code named INDEX_CORRUPTED.

I have interpreted that as the other way around, aka DATA_CORRUPTED
could be used as well to 2PC files :)
But grepping around it seems that you are grabbing the meaning better
than I do, ERRCODE_DATA_CORRUPTED is only used now for relation pages
or large pages.

>  Anyway, I don't think this patch needs to attach an error code because:
> * Currently, other interlal files (not tables or indexes) seem to use 
> INTERNAL_ERROR (XX000).  For example, see ReadControlFile() in xlog.c and 
> pgstat_read_statsfiles() in pgstat.c.
> * It doesn't seem that the user needs distinction.  I don't object to 
> providing a specific error code for this case, but if the patch needs a 
> specific error code to be committed, I'd like to know how that's useful (e.g. 
> how it affects the recovery action the user takes.)
> So, I'd like to mark the patch as ready for committer when ereport() and 
> errmsg() are on separate lines and the message changes to "two-phase state 
> file."

Okay. I got the message, and I agree with what you say here. You are
right by the way, the error messages just use "two-phase file" and not
"two-phase STATE file", missed that previously.
-- 
Michael


0001-Change-detection-of-corrupted-2PC-files-as-FATAL.patch
Description: Binary data


0002-Minimize-window-between-history-file-and-end-of-reco.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Speedup twophase transactions

2017-03-27 Thread Nikhil Sontakke
Hi Micheal,

The latest patch looks good. By now doing a single scan of shmem two phase
data, we have removed the double loops in all the affected functions which
is good.

My only question is if the added call to restoreTwoPhaseData() is good
enough to handle all the 3
functions PrescanPreparedTransactions(), StandbyRecoverPreparedTransactions()
and RecoverPreparedTransactions() appropriately? It looks as if it does,
but we need to be doubly sure..

PFA, revised patch with a very minor typo fix and rebase against latest
master. The test cases pass as needed.

Oh, btw, while running TAP tests, I got a few errors in unrelated tests.

"# testing connection parameter "target_session_attrs"

not ok 5 - connect to node master if mode "read-write" and master,standby_1
listed


#   Failed test 'connect to node master if mode "read-write" and
master,standby_1 listed'

#   at t/001_stream_rep.pl line 93.

#  got: ''

# expected: '1'

not ok 6 - connect to node master if mode "read-write" and standby_1,master
listed


#   Failed test 'connect to node master if mode "read-write" and
standby_1,master listed'

#   at t/001_stream_rep.pl line 93.

#  got: ''

# expected: '1'

not ok 7 - connect to node master if mode "any" and master,standby_1 listed


#   Failed test 'connect to node master if mode "any" and master,standby_1
listed'

#   at t/001_stream_rep.pl line 93.

#  got: ''

# expected: '1'

not ok 8 - connect to node standby_1 if mode "any" and standby_1,master
listed"

Again, not related to this recovery code path, but not sure if others see
this as well.

Regards,
Nikhils

On 27 March 2017 at 05:35, Michael Paquier 
wrote:

> On Sun, Mar 26, 2017 at 4:50 PM, Nikhil Sontakke
>  wrote:
> > I was away for a bit. I will take a look at this patch and get back to
> you
> > soon.
>
> No problem. Thanks for your time!
> --
> Michael
>



-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services


twophase_recovery_shmem_280317.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-03-27 Thread Vaishnavi Prabakaran
Thanks Craig and Michael for confirming that "PQsetSingleRowMode" should be
called right after "PQbatchQueueProcess".

Michael Paquier wrote:

> It seems to me that
>this should also be effective only during the fetching of one single
>result set. When the client moves on to the next item in the queue we
>should make necessary again a call to PQsetSingleRowMode().

Yes, the current behavior(with V6 code patch) is exactly the same as you
described above. PQsetSingleRowMode() should be called each time after
"PQbatchQueueProcess"
to set result processing to single-row mode.

>src/test/modules/Makefile should include test_libpq.

Yes, added test_libpq to the list in Makefile.

Daniel Verite wrote:

>> Please let me know if you think this is not enough but wanted to update
>> section 33.6 also?

>Yes, if the right place to call PQsetSingleRowMode() is immediately
>after PQbatchQueueProcess(), I think we need to update
>"33.6. Retrieving Query Results Row-By-Row"
>with that information, otherwise what it says is just wrong
>when in batch mode.

Yes, I have updated Chapter 33.6 by adding note for batch mode.

>Also, in 33.5, I suggest to not use the "currently executing
>query" as a synonym for the "query currently being processed"
>to avoid any confusion between what the backend is executing
>and a prior query whose results are being collected by the client
>at the same moment.

Yes correct, I modified the words to "query currently being processed" as
suggested.



> One bit of functionality that does not work in batch mode and is left
> as is by the patch is the PQfn() fast path interface and the large object
> functions that use it.
>
> Currently, calling lo_* functions inside a batch will fail with a message
> that depends on whether the internal lo_initialize() has been successfully
> called before.
>
> If it hasn't, PQerrorMessage() will be:
>"Synchronous command execution functions are not allowed in batch mode"
> which is fine, but it comes by happenstance from lo_initialize()
> calling PQexec() to fetch the function OIDs from pg_catalog.pg_proc.
>
> OTOH, if lo_initialize() has succeeded before, a call to a large
> object function will fail with a different message:
>   "connection in wrong state"
> which is emitted by PQfn() based on conn->asyncStatus != PGASYNC_IDLE
>
> Having an unified error message would be more user friendly.
>
>
Thanks for finding out this and yes, added a new check in PQfn() to give
the same error message - "Synchronous command execution functions are not
allowed in batch mode" when called in batch mode.



> Concerning the doc, when saying in 33.5.2:
>  "In batch mode only asynchronous operations are permitted"
> the server-side functions are indeed ruled out, since PQfn() is
> synchronous, but maybe we should be a bit more explicit
> about that?


> Chapter 34.3 (Large Objects / Client Interfaces) could also
> mention the incompatibility with batch mode.
>
>
Updated 33.5.2 to be more clear about what functions are allowed and what
are not allowed. Updated Chapter 33.3(Large Objects/ Client Interfaces) to
let the user know about the incompatibility with batch mode .


Attached the latest patch and here is the RT run result:
ok 1 - testlibpqbatch disallowed_in_batch
ok 2 - testlibpqbatch simple_batch
ok 3 - testlibpqbatch multi_batch
ok 4 - testlibpqbatch batch_abort
ok 5 - testlibpqbatch timings
ok 6 - testlibpqbatch copyfailure
ok 7 - testlibpqbatch test_singlerowmode
ok
All tests successful.
Files=1, Tests=7,  5 wallclock secs ( 0.01 usr  0.00 sys +  1.79 cusr  0.35
csys =  2.15 CPU)
Result: PASS

Thanks & Regards,
Vaishnavi,
Fujitsu Australia.


0001-Pipelining-batch-support-for-libpq-code-v7.patch
Description: Binary data


0002-Pipelining-batch-support-for-libpq-test-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-27 Thread Haribabu Kommi
On Tue, Mar 28, 2017 at 3:40 PM, Haribabu Kommi 
wrote:

>
> update patch attached.
>

Forgot to execute git commit, new patch attached.

Regards,
Hari Babu
Fujitsu Australia


pg_stat_walwrites_view_4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_wal_write statistics view

2017-03-27 Thread Haribabu Kommi
On Mon, Mar 27, 2017 at 1:27 PM, Haribabu Kommi 
wrote:

>
>
> On Sat, Mar 25, 2017 at 6:40 AM, Fujii Masao 
> wrote:
>
>> On Wed, Feb 15, 2017 at 12:53 PM, Haribabu Kommi
>>  wrote:
>> >
>> >
>> > On Wed, Feb 8, 2017 at 9:36 PM, Amit Kapila 
>> wrote:
>> >>
>> >> On Tue, Feb 7, 2017 at 11:47 AM, Haribabu Kommi
>> >>  wrote:
>> >> > Hi Hackers,
>> >> >
>> >> > I just want to discuss adding of a new statistics view that provides
>> >> > the information of wal writing details as follows
>> >> >
>> >>
>> >> +1.  I think it will be useful to observe WAL activity.
>> >
>> >
>> > Thanks for your opinion.
>> >
>> >> > postgres=# \d pg_stat_wal_writer
>> >> > View "pg_catalog.pg_stat_wal_writer"
>> >> > Column |   Type   | Collation |
>> Nullable
>> >> > |
>> >> > Default
>> >> >
>> >> > ---+--+-
>> --+--+-
>> >> >  num_backend_writes   | bigint   |   |
>> >> > |
>> >> >  num_total_writes  | bigint   |   |
>> |
>> >> >  num_blocks  | bigint   |   |  |
>> >> >  total_write_time   | bigint|   |  |
>> >> >  stats_reset   | timestamp with time zone |   |
>> >> > |
>> >> >
>> >> > The columns of the view are
>> >> > 1. Total number of xlog writes that are called from the backend.
>> >> > 2. Total number of xlog writes that are called from both backend
>> >> >  and background workers. (This column can be changed to just
>> >> >  display on the background writes).
>> >> > 3. The number of the blocks that are written.
>> >> > 4. Total write_time of the IO operation it took, this variable data
>> is
>> >> > filled only when the track_io_timing GUC is enabled.
>> >>
>> >> So, here is *write_time* the total time system has spent in WAL
>> >> writing before the last reset?
>> >
>> >
>> > total write_time spent in WAL writing "after" the last reset in
>> > milliseconds.
>> >
>> >> I think there should be a separate column for write and sync time.
>> >>
>> >
>> > Added.
>> >
>> >>
>> >> > Or it is possible to integrate the new columns into the existing
>> >> > pg_stat_bgwriter view also.
>> >> >
>> >>
>> >> I feel separate view is better.
>> >
>> >
>> > Ok.
>> >
>> > Following the sample out of the view after regress run.
>> >
>> > postgres=# select * from pg_stat_walwrites;
>> > -[ RECORD 1 ]--+--
>> > backend_writes | 19092
>> > writes | 663
>> > write_blocks   | 56116
>> > write_time | 0
>> > sync_time  | 3064
>> > stats_reset| 2017-02-15 13:37:09.454314+11
>> >
>> > Currently, writer, walwriter and checkpointer processes
>> > are considered as background processes that can do
>> > the wal write mainly.
>>
>
> Thanks for the review.
>
> I'm not sure if this categorization is good. You told that this view is
>> useful
>> to tune walwriter parameters at the top of this thread. If so, ISTM that
>> the information about walwriter's activity should be separated from
>> others.
>>
>
> Yes, that's correct. First I thought of providing the statistics of
> walwriter, but
> later in development, it turned into showing statistics of all wal write
> activity
> of background processes also to differentiate the actual write by the
> backends.
>
>
>> What about other processes which *can* write WAL, for example walsender
>> (e.g., BASE_BACKUP can cause WAL record), startup process (e.g., end-of-
>> recovery checkpoint) and logical replication worker (Not sure if it always
>> works with synchronous_commit=off, though). There might be other processes
>> which can write WAL
>
>
> It is possible to add the walsender, stratup and other processes easily,
> but not
> background workers that does some wal write operations until unless they
> report the stats with pgstat_report_stat(). Is it fine to ignore the
> workers that
> does not report the stats?
>

Added stats collection for walsender, statrup and autovacuum processes.
The background workers that call pgstat_report_stat() function will
automatically
included.


>
>
>> Why didn't you separate "write_blocks", "write_time" and "sync_time" per
>> the process category, like "backend_writes" and "writes"?
>>
>
> Ok. I will add those columns.
>

Added the new columns to differentiate backend and background processes.

This view doesn't seem to take into consideration the WAL writing and
>> flushing
>> during creating new WAL segment file.
>>
>> I think that it's better to make this view report also the number of WAL
>> pages
>> which are written when wal_buffer is full. This information is useful to
>> tune the size of wal_buffers. This was proposed by Nagayasu before.
>> https://www.postgresql.org/message-id/4ff824f3.5090...@uptime.jp
>
>
> Ok. But this new column just shows how many times 

Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> While I agree with that in general, we are taking about 2PC files that are
> on disk in patch 0001, so I would think that in this case
> ERRCODE_DATA_CORRUPTED is the most adapted (or
> ERRCODE_TWOPHASE_CORRUPTED?).
> 
> The other WARNING messages refer to stale files of already committed
> transactions, which are not actually corrupted. What about in this case
> having a ERRCODE_TWOPHASE_INVALID?
> 
> Updated patches are attached, I did not change the WARNING portion though
> as I am not sure what's the consensus on the matter.
> 

I get the impression that DATA_CORRUPTED means the table data is corrupted, 
because there's an error code named INDEX_CORRUPTED.  Anyway, I don't think 
this patch needs to attach an error code because:

* Currently, other interlal files (not tables or indexes) seem to use 
INTERNAL_ERROR (XX000).  For example, see ReadControlFile() in xlog.c and 
pgstat_read_statsfiles() in pgstat.c.

* It doesn't seem that the user needs distinction.  I don't object to providing 
a specific error code for this case, but if the patch needs a specific error 
code to be committed, I'd like to know how that's useful (e.g. how it affects 
the recovery action the user takes.)


So, I'd like to mark the patch as ready for committer when ereport() and 
errmsg() are on separate lines and the message changes to "two-phase state 
file."

Regards
Takayuki Tsunakawa





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tomas Vondra



On 03/27/2017 01:40 PM, Rushabh Lathia wrote:


...
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.

create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;

This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.

Currently gather_merge_clear_slots() clear out the tuple table slots for 
each

gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.



Isn't that just another sign the code might be a bit too confusing? I 
see two main issues in the code:


1) allocating 'slots' as 'nreaders+1' elements, which seems like a good 
way to cause off-by-one errors


2) mixing objects with different life spans (slots for readers vs. slot 
for the leader) seems like a bad idea too


I wonder how much we gain by reusing the slot from the leader (I'd be 
surprised if it was at all measurable). David posted a patch reworking 
this, and significantly simplifying the GatherMerge node. Why not to 
accept that?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] free space map and visibility map

2017-03-27 Thread Kyotaro HORIGUCHI
I'd like to have a comment from Heikki or Tom.

At Mon, 27 Mar 2017 16:49:08 +0900, Masahiko Sawada  
wrote in 
> On Mon, Mar 27, 2017 at 2:38 PM, Kyotaro HORIGUCHI
>  wrote:
> > Other than by FPI, FSM update is omitted when record LSN is older
> > than page LSN. If heap page is evicted but FSM page is not after
> > vacuuming and before power cut, replaying HEAP2_CLEAN skips
> > update of FSM even though FPI is not attached. Of course this
> > cannot occur on standby. One FSM page covers as many heap pages
> > as about 4k, so FSM can stay far longer than heap pages.
> >
> > ALL_FROZEN is set with other than HEAP2_FREEZE_PAGE. When a page
> > is already empty when entering lazy_sacn_heap, or a page of
> > non-indexed heap is empitied in lazy_scan_heap, HRAP2_VISIBLE is
> > issued to set ALL_FROZEN.
> >
> > Perhaps the problem will be fixed by forcing heap_xlog_visible to
> > update FSM (addition to FREEZE_PAGE), or the same in
> > heap_xlog_clean. (As menthined in the previous mail, I prefer the
> > latter.)
> 
> Maybe it's enough just to make both heap_xlog_visible and
> heap_xlog_freeze_page forcibly updates the FSM (heap_xlog_freeze_page
> might be unnecessary). Because the problem happens on the page that is
> full according to FSM but is empty and marked as all-visible or

It would work and straightforward.

Currently FSM seems to be assumed as a part of heap from the view
of WAL. From the point of view, the problem is heap_xlog_clean
omits updating FSM for certain cases. My only concern is whether
updating heap information by visibility map record is right or
not. The code indents to reduce FSM updates without having
problem. For the insert/update cases, the problem is too-large
freespace information in FSM can cause needless fetches of heap
pages. But things are a bit different for the clean case. The
problem is too-small freespace information that causes
everlasting empty pages.

I dug out the original discussion. The mention on this was found
here.

https://www.postgresql.org/message-id/24334.1225205478%40sss.pgh.pa.us

Tom Lane wrote:
| Heikki Linnakangas  writes:
| > One issue with this patch is that it doesn't update the FSM at all when 
| > pages are restored from full page images. It would require fetching the 
| > page and checking the free space on it, or peeking into the size of the 
| > backup block data, and I'm not sure if it's worth the extra code to do that.
| 
| I'd vote not to bother, at least not in the first cut.  As you say, 100%
| accuracy isn't required, and I think that in typical scenarios an
| insert/update that causes a page to become full would be relatively less
| likely to have a full-page image.

This is the 'first cut' shape, which hadn't cause a apparent
problem without ALL_FROZEN.

> all-frozen. Though heap_xlog_clean loads the heap page to the memory
> for redo operation, forcing heap_xlog_clean to update FSM might be
> overkill for this solution. Because it can happen on every pages that
> are not marked as neither all-visible nor all-frozen. Basically 100%

I'm not sure that it is defeinitely not an overkill but it seems
to me the same with the 20% rule of insert/update cases. We must
avoid 0% or too-small (under 20%?) FSM info on heap_clean for the
case especially for FREEZEing.

> accuracy of FSM is not required. On the other hand, if we makes

Yes, what is needed here is not accuracy, but miminum guratantee
not to cause a critical problem.

> heap_xlog_visible updates the FSM, it requires to load both heap page
> and FSM page, which can also be overhead. Another idea is, we can
> heap_xlog_visible to have the freespace of corresponding heap page,
> and then update FSM during recovery.

I haven't considered it. Counting freepsace by visiblilty logs is
worse in I/O perspective. Seems somewhat arbitrary but having
freespace in VM records seems to work.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tomas Vondra



On 03/27/2017 05:51 PM, Robert Haas wrote:

On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:

Robert Haas  writes:

On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
 wrote:

But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.



It was intentional.  See the last paragraph of
https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com


Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.


Hey, imagine if the comments explained the logic behind the code!

Good idea.  How about the attached?



Certainly an improvement. But perhaps we should also mention this at 
compute_parallel_worker, i.e. that not looking at max_parallel_workers 
is intentional.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improve access to parallel query from procedural languages.

2017-03-27 Thread Rafia Sabih
On Mon, Mar 27, 2017 at 5:54 PM, Robert Haas  wrote:
>
> If it's just that they are relying on unsynchronized global variables,
> then it's sufficient to mark them parallel-restricted ('r').  Do we
> really need to go all the way to parallel-unsafe ('u')?
>
Done.

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


system_defined_fn_update_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] O(1) DSM handle operations

2017-03-27 Thread Thomas Munro
On Tue, Mar 28, 2017 at 3:52 PM, Robert Haas  wrote:
> On Mon, Mar 27, 2017 at 5:13 PM, Thomas Munro
>  wrote:
>> This is just a thought for discussion, no patch attached...
>>
>> DSM operations dsm_create(), dsm_attach(), dsm_unpin_segment() perform
>> linear searches of the dsm_control->item array for either a free slot
>> or a slot matching a given handle.  Maybe no one thinks this is a
>> problem, because in practice the number of DSM slots you need to scan
>> should be something like number of backends * some small factor at
>> peak.
>
> One thing I thought about when designing the format of the DSM control
> segment was that we need to (attempt to) reread the old segment after
> recovering from a crash, even if it's borked.  With the current
> design, I think that nothing too bad can happen even if some or all of
> the old control segment has been overwritten with gibberish.  I mean,
> if we get particularly unlucky, we might manage to remove a DSM
> segment that some other cluster is using, but we'd have to be very
> unlucky for things to even get that bad, and we shouldn't crash
> outright.
>
> If we replace the array with some more complicated data structure,
> we'd have to be sure that reading it is robust against it having been
> scrambled by a previous crash.  Otherwise, it won't be possible to
> restart the cluster without manual intervention.

Couldn't cleanup code continue to work just the same way though?  The
only extra structure is an intrusive freelist, but that could be
completely ignored by code that wants to scan the whole array after
crash.  It would only be used to find a free slot after successful
restart, once the freelist is rebuilt and known to be sane, and could
be sanity checked when accessed by dsm_create.  So idea 2 doesn't seem
to make that code any less robust, does it?

Deterministic key_t values for SysV IPC do seem problematic thought,
for multiple PostgreSQL clusters.  Maybe that is a serious problem for
idea 1.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Craig Ringer
On 28 March 2017 at 10:53, Craig Ringer  wrote:

> However, even once I add an option to force decoding of 2pc xacts with
> catalog changes to test_decoding, I cannot reproduce the expected
> locking issues so far. See tests in attached updated version, in
> contrib/test_decoding/sql/prepare.sql .

I haven't been able to create issues with CLUSTER, any ALTER TABLEs
I've tried, or anything similar.

An explicit AEL on pg_attribute causes the decoding stall, but you
can't do anything much else either, and I don't see how that'd arise
under normal circumstances.

If it's a sufficiently obscure issue I'm willing to document it as
"don't do that" or "use a command filter to prohibit that". But it's
more likely that I'm just not spotting the cases where the issue
arises.

Attempting to CLUSTER a system catalog like pg_class or pg_attribute
causes PREPARE TRANSACTION to fail with

ERROR:  cannot PREPARE a transaction that modified relation mapping

and I didn't find any catalogs I could CLUSTER that'd also block decoding.



-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Petr Jelinek
On 28/03/17 04:46, Robert Haas wrote:
> On Mon, Mar 27, 2017 at 10:04 PM, Andres Freund  wrote:
>>> Btw now that I look at the code, I guess we'll want to get rid of
>>> bgw_main completely in HEAD given that we can't guarantee it will be
>>> valid even for shared_preload_library libraries. For older branches I
>>> would leave things as they are in this regard as there don't seem to be
>>> any immediate issue for standard binaries.
>>
>> As long as you fix it so culicidae is happy (in 9.6) ;).  I think it's
>> fine to just introduce bgw_builtin_id or such, and leave the bgw_main
>> code in place in < HEAD.
> 
> I wasn't thinking of introducing bgw_builtin_id.  My idea was just
> along the lines of
> 
> if (bgw_library_name == NULL && bgw_function_name != NULL)
> {
> if (strcmp(bgw_function_name, "ParallelQueryMain") == 0)
>ParallelQueryMain(blah);
> else if (strcmp(bgw_function_name, "LogicalReplicationMain") == 0)
>LogicalReplicationMain(blah);
> }
> 
> I think something like that is certainly better for the back-branches,
> because it doesn't cause an ABI break.  But I think it would also be
> fine for master.
> 

I had something slightly more complex like the attached in mind.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


0001-Remove-bgw_main-from-bgworker-and-always-use-library.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Mon, Mar 27, 2017 at 4:45 PM, Amit Kapila 
wrote:

> On Sat, Mar 25, 2017 at 1:24 PM, Amit Kapila 
> wrote:
> > On Fri, Mar 24, 2017 at 11:49 PM, Pavan Deolasee
> >  wrote:
> >>
> >> On Fri, Mar 24, 2017 at 6:46 PM, Amit Kapila 
> >> wrote:
> >>>
> >
> >> While looking at this problem, it occurred to me that the assumptions
> made
> >> for hash indexes are also wrong :-( Hash index has the same problem as
> >> expression indexes have. A change in heap value may not necessarily
> cause a
> >> change in the hash key. If we don't detect that, we will end up having
> two
> >> hash identical hash keys with the same TID pointer. This will cause the
> >> duplicate key scans problem since hashrecheck will return true for both
> the
> >> hash entries.
>
> Isn't it possible to detect duplicate keys in hashrecheck if we
> compare both hashkey and tid stored in index tuple with the
> corresponding values from heap tuple?
>
>
Hmm.. I thought that won't work. For example, say we have a tuple (X, Y, Z)
in the heap with a btree index on X and a hash index on Y. If that is
updated to (X, Y', Z) and say we do a WARM update and insert a new entry in
the hash index. Now if Y and Y' both generate the same hashkey, we will
have exactly similar looking  tuples in the hash index
leading to duplicate key scans.

I think one way to solve this is to pass both old and new heap values to
amwarminsert and expect each AM to detect duplicates and avoid creating of
a WARM pointer if index keys are exactly the same (we can do that since
there already exists another index tuple with the same keys pointing to the
same root TID).

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...

2017-03-27 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kang Yuzhe

> 1. Prepare Hands-on with PG internals
> 
> 
>  For example, a complete Hands-on  with SELECT/INSERT SQL Standard PG
> internals. The point is the experts can pick one fairly complex feature
> and walk it from Parser to Executor in a hands-on manner explaining step
> by step every technical detail.

I sympathize with you.  What level of detail do you have in mind?  The query 
processing framework is described in the manual:

Chapter 50. Overview of PostgreSQL Internals
https://www.postgresql.org/docs/devel/static/overview.html

More detailed source code analysis is provided for very old PostgreSQL 7.4, but 
I guess it's not much different now.  The document is in Japanese only: 

http://ikubo.x0.com/PostgreSQL/pg_source.htm

Are you thinking of something like this?

MySQL Internals Manual
https://dev.mysql.com/doc/internals/en/





Regards
Takayuki Tsunakawa


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Bruce Momjian
On Tue, Mar 28, 2017 at 08:04:34AM +0530, Pavan Deolasee wrote:
> And I've answered it so many times by now :-) 

LOL

> Just to add more to what I just said in another email, note that HOT/WARM
> chains are created when a new root line pointer is created in the heap (a line
> pointer that has an index pointing to it). And a new root line pointer is
> created when a non-HOT/non-WARM update is performed. As soon as you do a
> non-HOT/non-WARM update, the next update can again be a WARM update even when
> everything fits in a single block. 
> 
> That's why for a workload which doesn't do HOT updates and where not all index
> keys are updated, you'll find every alternate update to a row to be a WARM
> update, even when there is no chain conversion. That itself can save lots of
> index bloat, reduce IO on the index and WAL.
> 
> Let me know if its still not clear and I can draw some diagrams to explain it.

Ah, yes, that does help to explain the 50% because 50% of updates are
now HOT/WARM.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Craig Ringer
On 28 March 2017 at 09:25, Andres Freund  wrote:

> If you actually need separate decoding of 2PC, then you want to wait for
> the PREPARE to be replicated.  If that replication has to wait for the
> to-be-replicated prepared transaction to commit prepared, and commit
> prepare will only happen once replication happened...

In other words, the output plugin cannot decode a transaction at
PREPARE TRANSACTION time if that xact holds an AccessExclusiveLock on
a catalog relation we must be able to read in order to decode the
xact.

>> Is there any other scenarios where catalog readers are blocked except 
>> explicit lock
>> on catalog table? Alters on catalogs seems to be prohibited.
>
> VACUUM FULL on catalog tables (but that can't happen in xact => 2pc)
> CLUSTER on catalog tables (can happen in xact)
> ALTER on tables modified in the same transaction (even of non catalog
> tables!), because a lot of routines will do a heap_open() to get the
> tupledesc etc.

Right, and the latter one is the main issue, since it's by far the
most likely and hard to just work around.

The tests Stas has in place aren't sufficient to cover this, as they
decode only after everything has committed. I'm expanding the
pg_regress coverage to do decoding between prepare and commit (when we
actually care) first, and will add some tests involving strong locks.
I've found one bug where it doesn't decode a 2pc xact at prepare or
commit time, even without restart or strong lock issues. Pretty sure
it's due to assumptions made about the filter callback.

The current code as used by test_decoding won't work correctly. If
txn->has_catalog_changes and if it's still in-progress, the filter
skips decoding at PREPARE time. But it isn't then decoded at COMMIT
PREPARED time either, if we processed past the PREPARE TRANSACTION.
Bug.

Also, by skipping decoding of 2pc xacts with catalog changes in this
test we also hide the locking issues.

However, even once I add an option to force decoding of 2pc xacts with
catalog changes to test_decoding, I cannot reproduce the expected
locking issues so far. See tests in attached updated version, in
contrib/test_decoding/sql/prepare.sql .

Haven't done any TAP tests yet, since the pg_regress tests are so far
sufficient to turn up issues.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From e23909a9929b561e011f41891825bfb5b1ecb1b3 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Tue, 28 Mar 2017 10:51:48 +0800
Subject: [PATCH] Logical decoding of two-phase commit transactions at PREPARE
 time

---
 contrib/test_decoding/expected/prepared.out | 253 
 contrib/test_decoding/sql/prepared.sql  |  82 +++-
 contrib/test_decoding/test_decoding.c   | 205 ++-
 src/backend/access/rmgrdesc/xactdesc.c  |  37 +++-
 src/backend/access/transam/twophase.c   |  89 -
 src/backend/access/transam/xact.c   |  72 ++-
 src/backend/replication/logical/decode.c| 148 --
 src/backend/replication/logical/logical.c   | 141 +
 src/backend/replication/logical/reorderbuffer.c | 129 +++-
 src/backend/replication/logical/snapbuild.c |  48 -
 src/include/access/twophase.h   |   3 +
 src/include/access/xact.h   |  44 -
 src/include/replication/logical.h   |   6 +-
 src/include/replication/output_plugin.h |  36 
 src/include/replication/reorderbuffer.h |  50 +
 src/include/replication/snapbuild.h |   4 +
 16 files changed, 1254 insertions(+), 93 deletions(-)

diff --git a/contrib/test_decoding/expected/prepared.out b/contrib/test_decoding/expected/prepared.out
index 46e915d..56c6e72 100644
--- a/contrib/test_decoding/expected/prepared.out
+++ b/contrib/test_decoding/expected/prepared.out
@@ -6,19 +6,84 @@ SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot', 'test_d
  init
 (1 row)
 
-CREATE TABLE test_prepared1(id int);
-CREATE TABLE test_prepared2(id int);
+SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot_2pc', 'test_decoding');
+ ?column? 
+--
+ init
+(1 row)
+
+CREATE TABLE test_prepared1(id integer primary key);
+CREATE TABLE test_prepared2(id integer primary key);
+-- Reused queries
+\set get_no2pc 'SELECT data FROM pg_logical_slot_get_changes(''regression_slot_2pc'', NULL, NULL, ''include-xids'', ''0'', ''skip-empty-xacts'', ''1'');'
+\set get_with2pc 'SELECT data FROM pg_logical_slot_get_changes(''regression_slot'', NULL, NULL, ''include-xids'', ''0'', ''skip-empty-xacts'', ''1'', ''twophase-decoding'', ''1'');'
+\set get_with2pc_nofilter 'SELECT data FROM pg_logical_slot_get_changes(''regression_slot'', NULL, NULL, ''include-xids'', ''0'', ''skip-empty-xacts'', ''1'', ''twophase-decoding'', ''1'', 

Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Tom Lane
Robert Haas  writes:
> I wasn't thinking of introducing bgw_builtin_id.  My idea was just
> along the lines of

> if (bgw_library_name == NULL && bgw_function_name != NULL)
> {
> if (strcmp(bgw_function_name, "ParallelQueryMain") == 0)
>ParallelQueryMain(blah);
> else if (strcmp(bgw_function_name, "LogicalReplicationMain") == 0)
>LogicalReplicationMain(blah);
> }

> I think something like that is certainly better for the back-branches,
> because it doesn't cause an ABI break.  But I think it would also be
> fine for master.

That seems perfectly reasonable from here: surely the cost of a couple
of strcmp's is trivial in comparison to a process launch.

We can redesign the API whenever this way starts getting unwieldy,
but that's likely to be quite some time away.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] O(1) DSM handle operations

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 5:13 PM, Thomas Munro
 wrote:
> This is just a thought for discussion, no patch attached...
>
> DSM operations dsm_create(), dsm_attach(), dsm_unpin_segment() perform
> linear searches of the dsm_control->item array for either a free slot
> or a slot matching a given handle.  Maybe no one thinks this is a
> problem, because in practice the number of DSM slots you need to scan
> should be something like number of backends * some small factor at
> peak.

One thing I thought about when designing the format of the DSM control
segment was that we need to (attempt to) reread the old segment after
recovering from a crash, even if it's borked.  With the current
design, I think that nothing too bad can happen even if some or all of
the old control segment has been overwritten with gibberish.  I mean,
if we get particularly unlucky, we might manage to remove a DSM
segment that some other cluster is using, but we'd have to be very
unlucky for things to even get that bad, and we shouldn't crash
outright.

If we replace the array with some more complicated data structure,
we'd have to be sure that reading it is robust against it having been
scrambled by a previous crash.  Otherwise, it won't be possible to
restart the cluster without manual intervention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Michael Paquier
On Mon, Mar 27, 2017 at 11:20 PM, Tom Lane  wrote:
> Alexander Korotkov  writes:
>> On Mon, Mar 27, 2017 at 4:48 PM, Tom Lane  wrote:
>>> Actually, the *real* problem with that coding is it lacks a SQLSTATE
>>> (errcode call).  The only places where it's acceptable to leave that
>>> out are for internal "can't happen" cases, which this surely isn't.
>
>> Surrounding code also has ereports lacking SQLSTATE.  And that isn't "can't
>> happen" case as well.
>> Should we consider fixing them?
>
> Yup.  Just remember that the default is
> XX000EERRCODE_INTERNAL_ERROR  internal_error
>
> If that's not how you want the error case reported, you need an errcode()
> call.
>
> We might need more ERRCODEs than are there now, if none of the existing
> ones seem to fit these cases.  There's already ERRCODE_DATA_CORRUPTED
> and ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for
> example?

While I agree with that in general, we are taking about 2PC files that
are on disk in patch 0001, so I would think that in this case
ERRCODE_DATA_CORRUPTED is the most adapted (or
ERRCODE_TWOPHASE_CORRUPTED?).

The other WARNING messages refer to stale files of already committed
transactions, which are not actually corrupted. What about in this
case having a ERRCODE_TWOPHASE_INVALID?

Updated patches are attached, I did not change the WARNING portion
though as I am not sure what's the consensus on the matter.
-- 
Michael


0001-Change-detection-of-corrupted-2PC-files-as-FATAL.patch
Description: Binary data


0002-Minimize-window-between-history-file-and-end-of-reco.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] standardized backwards incompatibility tag for commits

2017-03-27 Thread Bruce Momjian
On Sat, Mar 25, 2017 at 04:15:39PM -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Seems like it'd be good to standardize how we're declaring that a commit
> > contains backwards incompatible changes.  I've seen
> > - 'BACKWARDS INCOMPATIBLE CHANGE'
> > - 'BACKWARD INCOMPATIBILITY'
> > - a lot of free-flow text annotations like "as a
> >   backward-incompatibility", "This makes a backwards-incompatible change"
> 
> > Especially the latter are easy to miss when looking through the commit
> > log and I'd bet some get missed when generating the release notes.
> 
> Bruce might have a different opinion, but for my own part I do not think
> it would make any difference in creating the release notes.  The important
> thing is that the information be there in the log entry, not exactly how
> it's spelled.

Yes, it doesn't matter as long as it is stated somehow.  I don't know of
any missing cases due to text differences.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Crash on promotion when recovery.conf is renamed

2017-03-27 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> "Tsunakawa, Takayuki"  writes:
> > All other places in twophase.c and most places in other files put ereport()
> and errmsg() on separate lines.  I think it would be better to align with
> surrounding code.
> 
> > +   ereport(FATAL, (errmsg("corrupted
> two-phase file \"%s\"",
> 
> Actually, the *real* problem with that coding is it lacks a SQLSTATE (errcode
> call).  The only places where it's acceptable to leave that out are for
> internal "can't happen" cases, which this surely isn't.

Oh, I overlooked it.  But...


> Yup.  Just remember that the default is
> XX000EERRCODE_INTERNAL_ERROR  internal_error
> 
> If that's not how you want the error case reported, you need an errcode()
> call.
> 
> We might need more ERRCODEs than are there now, if none of the existing
> ones seem to fit these cases.  There's already ERRCODE_DATA_CORRUPTED and
> ERRCODE_INDEX_CORRUPTED; maybe we need ERRCODE_WAL_CORRUPTED, for
> example?

I'd be always happy if the error code is more specific, but maybe that would be 
a separate patch.  WAL corruption message so far doesn't accompany a specific 
error code like this in xlog.c:

/*
 * We only end up here without a message when 
XLogPageRead()
 * failed - in that case we already logged something. In
 * StandbyMode that only happens if we have been 
triggered, so we
 * shouldn't loop anymore in that case.
 */
if (errormsg)
ereport(emode_for_corrupt_record(emode,

 RecPtr ? RecPtr : EndRecPtr),
(errmsg_internal("%s", errormsg) /* already 
translated */ ));

Regards
Takayuki Tsunakawa




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 10:04 PM, Andres Freund  wrote:
>> Btw now that I look at the code, I guess we'll want to get rid of
>> bgw_main completely in HEAD given that we can't guarantee it will be
>> valid even for shared_preload_library libraries. For older branches I
>> would leave things as they are in this regard as there don't seem to be
>> any immediate issue for standard binaries.
>
> As long as you fix it so culicidae is happy (in 9.6) ;).  I think it's
> fine to just introduce bgw_builtin_id or such, and leave the bgw_main
> code in place in < HEAD.

I wasn't thinking of introducing bgw_builtin_id.  My idea was just
along the lines of

if (bgw_library_name == NULL && bgw_function_name != NULL)
{
if (strcmp(bgw_function_name, "ParallelQueryMain") == 0)
   ParallelQueryMain(blah);
else if (strcmp(bgw_function_name, "LogicalReplicationMain") == 0)
   LogicalReplicationMain(blah);
}

I think something like that is certainly better for the back-branches,
because it doesn't cause an ABI break.  But I think it would also be
fine for master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Andres Freund
On 2017-03-28 03:30:28 +0100, Simon Riggs wrote:
> On 28 March 2017 at 02:25, Andres Freund  wrote:
> > On 2017-03-28 04:12:41 +0300, Stas Kelvich wrote:
> >>
> >> > On 28 Mar 2017, at 00:25, Andres Freund  wrote:
> >> >
> >> > Hi,
> >> >
> >> > On 2017-03-28 00:19:29 +0300, Stas Kelvich wrote:
> >> >> Ok, here it is.
> >> >
> >> > On a very quick skim, this doesn't seem to solve the issues around
> >> > deadlocks of prepared transactions vs. catalog tables.  What if the
> >> > prepared transaction contains something like LOCK pg_class; (there's a
> >> > lot more realistic examples)? Then decoding won't be able to continue,
> >> > until that transaction is committed / aborted?
> >>
> >> But why is that deadlock? Seems as just lock.
> >
> > If you actually need separate decoding of 2PC, then you want to wait for
> > the PREPARE to be replicated.  If that replication has to wait for the
> > to-be-replicated prepared transaction to commit prepared, and commit
> > prepare will only happen once replication happened...
> 
> Surely that's up to the decoding plugin?

It can't do much about it, so not really.  A lot of the functions
dealing with datatypes (temporarily) lock relations.  Both the actual
user tables, and system catalog tables (cache lookups...).


> If the plugin takes locks it had better make sure it can get the locks
> or timeout. But that's true of any resource the plugin needs access to
> and can't obtain when needed.


> This issue could occur now if the transaction tool a session lock on a
> catalog table.

That's not a self deadlock, and we don't don't do session locks outside
of operations like CIC?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 7:49 AM, Bruce Momjian  wrote:

> On Mon, Mar 27, 2017 at 04:29:56PM -0400, Robert Haas wrote:
> > On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
> >  wrote:
> > > It's quite hard to say that until we see many more benchmarks. As
> author of
> > > the patch, I might have got repetitive with my benchmarks. But I've
> seen
> > > over 50% improvement in TPS even without chain conversion (6 indexes
> on a 12
> > > column table test).
> >
> > This seems quite mystifying.  What can account for such a large
> > performance difference in such a pessimal scenario?  It seems to me
> > that without chain conversion, WARM can only apply to each row once
> > and therefore no sustained performance improvement should be possible
> > -- unless rows are regularly being moved to new blocks, in which case
> > those updates would "reset" the ability to again perform an update.
> > However, one would hope that most updates get done within a single
> > block, so that the row-moves-to-new-block case wouldn't happen very
> > often.
> >
> > I'm perplexed.
>
> Yes, I asked the same question in this email:
>
> https://www.postgresql.org/message-id/2017032119.
> ge16...@momjian.us
>
>
And I've answered it so many times by now :-)

Just to add more to what I just said in another email, note that HOT/WARM
chains are created when a new root line pointer is created in the heap (a
line pointer that has an index pointing to it). And a new root line pointer
is created when a non-HOT/non-WARM update is performed. As soon as you do a
non-HOT/non-WARM update, the next update can again be a WARM update even
when everything fits in a single block.

That's why for a workload which doesn't do HOT updates and where not all
index keys are updated, you'll find every alternate update to a row to be a
WARM update, even when there is no chain conversion. That itself can save
lots of index bloat, reduce IO on the index and WAL.

Let me know if its still not clear and I can draw some diagrams to explain
it.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Simon Riggs
On 28 March 2017 at 02:25, Andres Freund  wrote:
> On 2017-03-28 04:12:41 +0300, Stas Kelvich wrote:
>>
>> > On 28 Mar 2017, at 00:25, Andres Freund  wrote:
>> >
>> > Hi,
>> >
>> > On 2017-03-28 00:19:29 +0300, Stas Kelvich wrote:
>> >> Ok, here it is.
>> >
>> > On a very quick skim, this doesn't seem to solve the issues around
>> > deadlocks of prepared transactions vs. catalog tables.  What if the
>> > prepared transaction contains something like LOCK pg_class; (there's a
>> > lot more realistic examples)? Then decoding won't be able to continue,
>> > until that transaction is committed / aborted?
>>
>> But why is that deadlock? Seems as just lock.
>
> If you actually need separate decoding of 2PC, then you want to wait for
> the PREPARE to be replicated.  If that replication has to wait for the
> to-be-replicated prepared transaction to commit prepared, and commit
> prepare will only happen once replication happened...

Surely that's up to the decoding plugin?

If the plugin takes locks it had better make sure it can get the locks
or timeout. But that's true of any resource the plugin needs access to
and can't obtain when needed.

This issue could occur now if the transaction tool a session lock on a
catalog table.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 1:59 AM, Robert Haas  wrote:

> On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
>  wrote:
> > It's quite hard to say that until we see many more benchmarks. As author
> of
> > the patch, I might have got repetitive with my benchmarks. But I've seen
> > over 50% improvement in TPS even without chain conversion (6 indexes on
> a 12
> > column table test).
>
> This seems quite mystifying.  What can account for such a large
> performance difference in such a pessimal scenario?  It seems to me
> that without chain conversion, WARM can only apply to each row once
> and therefore no sustained performance improvement should be possible
> -- unless rows are regularly being moved to new blocks, in which case
> those updates would "reset" the ability to again perform an update.
> However, one would hope that most updates get done within a single
> block, so that the row-moves-to-new-block case wouldn't happen very
> often.
>
>
I think you're confusing between update chains that stay within a block vs
HOT/WARM chains. Even when the entire update chain stays within a block, it
can be made up of multiple HOT/WARM chains and each of these chains offer
ability to do one WARM update. So even without chain conversion, every
alternate update will be a WARM update. So the gains are perpetual.

For example, if I take a simplistic example of a table with just one tuple
and four indexes and where every update updates just one of the indexes.
Assuming no WARM chain conversion this is what would happen for every
update:

1. WARM update, new entry in just one index
2. Regular update, new entries in all indexes
3. WARM update, new entry in just one index
4. Regular update, new entries in all indexes

At the end of N updates (assuming all fit in the same block), one index
will have N entries and rest will have N/2 entries.

Compare that against master:
1. Regular update, new entries in all indexes
2. Regular update, new entries in all indexes
3. Regular update, new entries in all indexes
4. Regular update, new entries in all indexes


At the end of N updates (assuming all fit in the same block), all indexes
will have N entries.  So with WARM we reduce bloats in 3 indexes. And WARM
works almost in a perpetual way even without chain conversion. If you see
the graph I earlier shared (attach again), without WARM chain conversion
the rate of WARM updates settle down to 50%, which is not surprising given
what I explained above.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Bruce Momjian
On Mon, Mar 27, 2017 at 04:29:56PM -0400, Robert Haas wrote:
> On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
>  wrote:
> > It's quite hard to say that until we see many more benchmarks. As author of
> > the patch, I might have got repetitive with my benchmarks. But I've seen
> > over 50% improvement in TPS even without chain conversion (6 indexes on a 12
> > column table test).
> 
> This seems quite mystifying.  What can account for such a large
> performance difference in such a pessimal scenario?  It seems to me
> that without chain conversion, WARM can only apply to each row once
> and therefore no sustained performance improvement should be possible
> -- unless rows are regularly being moved to new blocks, in which case
> those updates would "reset" the ability to again perform an update.
> However, one would hope that most updates get done within a single
> block, so that the row-moves-to-new-block case wouldn't happen very
> often.
> 
> I'm perplexed.

Yes, I asked the same question in this email:

https://www.postgresql.org/message-id/2017032119.ge16...@momjian.us

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Pavan Deolasee
On Tue, Mar 28, 2017 at 1:32 AM, Alvaro Herrera 
wrote:

> Is the WARM tap test suite supposed to work when applied without all the
> other patches?  I just tried applied that one and running "make check -C
> src/test/modules", and it seems to hang after giving "ok 5" for
> t/002_warm_stress.pl.  (I had to add a Makefile too, attached.)
>
>
These tests should run without WARM. I wonder though if IPC::Run's
start/pump/finish facility is fully portable. Andrew on off-list
conversation reminded me that there are no (or may be one) tests currently
using that in Postgres. I've run these tests on OSX, will try on some linux
platform too.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Andres Freund
On 2017-03-28 03:47:50 +0200, Petr Jelinek wrote:
> On 28/03/17 03:31, Petr Jelinek wrote:
> > On 27/03/17 19:01, Robert Haas wrote:
> >> On Mon, Mar 27, 2017 at 12:50 PM, Andres Freund  wrote:
> >>> Robert, Petr, either of you planning to fix this (as outlined elsewhere
> >>> in the thred)?
> >>
> >> Oh, I didn't realize anybody was looking to me to fix this.  I sort of
> >> thought that it was fallout from the logical replication patch and
> >> that Petr or Peter would deal with it.  If that's not the case, I'm
> >> not totally unwilling to take a whack at it, but I don't have much
> >> personal enthusiasm for trying to figure out how to make dynamic
> >> loading on the postgres binary itself work everywhere, so if it falls
> >> to me to fix, it's likely to get a hard-coded check for some
> >> hard-coded name.
> >>
> > 
> > It affects parallel workers same way, I'll write patch for HEAD soon,
> > 9.6 probably with some delay. I'll expand the InternalBgWorkers thing
> > that was added with logical replication to handle this in similar
> > fashion how we do fmgrtab.
> > 
> 
> Btw now that I look at the code, I guess we'll want to get rid of
> bgw_main completely in HEAD given that we can't guarantee it will be
> valid even for shared_preload_library libraries. For older branches I
> would leave things as they are in this regard as there don't seem to be
> any immediate issue for standard binaries.

As long as you fix it so culicidae is happy (in 9.6) ;).  I think it's
fine to just introduce bgw_builtin_id or such, and leave the bgw_main
code in place in < HEAD.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing initplans

2017-03-27 Thread Amit Kapila
On Thu, Mar 16, 2017 at 2:34 AM, Kuntal Ghosh
 wrote:
> On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila  wrote:
>> Based on that idea, I have modified the patch such that it will
>> compute the set of initplans Params that are required below gather
>> node and store them as bitmap of initplan params at gather node.
>> During set_plan_references, we can find the intersection of external
>> parameters that are required at Gather or nodes below it with the
>> initplans that are passed from same or above query level. Once the set
>> of initplan params are established, we evaluate those (if they are not
>> already evaluated) before execution of gather node and then pass the
>> computed value to each of the workers.   To identify whether a
>> particular param is parallel safe or not, we check if the paramid of
>> the param exists in initplans at same or above query level.  We don't
>> allow to generate gather path if there are initplans at some query
>> level below the current query level as those plans could be
>> parallel-unsafe or undirect correlated plans.
>
> I would like to mention different test scenarios with InitPlans that
> we've considered while developing and testing of the patch.
>

Thanks a lot Kuntal for sharing different test scenarios.
Unfortunately, this patch doesn't received any review till now, so
there is no chance of making it in to PostgreSQL-10.  I have moved
this to next CF.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Petr Jelinek
On 28/03/17 03:31, Petr Jelinek wrote:
> On 27/03/17 19:01, Robert Haas wrote:
>> On Mon, Mar 27, 2017 at 12:50 PM, Andres Freund  wrote:
>>> Robert, Petr, either of you planning to fix this (as outlined elsewhere
>>> in the thred)?
>>
>> Oh, I didn't realize anybody was looking to me to fix this.  I sort of
>> thought that it was fallout from the logical replication patch and
>> that Petr or Peter would deal with it.  If that's not the case, I'm
>> not totally unwilling to take a whack at it, but I don't have much
>> personal enthusiasm for trying to figure out how to make dynamic
>> loading on the postgres binary itself work everywhere, so if it falls
>> to me to fix, it's likely to get a hard-coded check for some
>> hard-coded name.
>>
> 
> It affects parallel workers same way, I'll write patch for HEAD soon,
> 9.6 probably with some delay. I'll expand the InternalBgWorkers thing
> that was added with logical replication to handle this in similar
> fashion how we do fmgrtab.
> 

Btw now that I look at the code, I guess we'll want to get rid of
bgw_main completely in HEAD given that we can't guarantee it will be
valid even for shared_preload_library libraries. For older branches I
would leave things as they are in this regard as there don't seem to be
any immediate issue for standard binaries.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Michael Paquier
On Tue, Mar 28, 2017 at 9:37 AM, Michael Paquier
 wrote:
> On Tue, Mar 28, 2017 at 8:38 AM, Tsunakawa, Takayuki
>  wrote:
>> From: pgsql-hackers-ow...@postgresql.org
>>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
>>> Do you think that this qualifies as a bug fix for a backpatch? I would think
>>> so, but I would not mind waiting for some dust to be on it before 
>>> considering
>>> applying that on back-branches. Thoughts from others?
>>
>> I think so, too.  As change from rename() to durable_rename() was applied to 
>> all releases, maybe we can follow that.  In addition, keeping code 
>> differences as few as possible would make it easier for committers to apply 
>> other bug fixes to all versions.
>
> The patch of HEAD applies cleanly on REL9_6_STABLE,

Actually that's not correct as well as pg_clog has been renamed to
pg_xact. Let's be careful here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Petr Jelinek
On 27/03/17 19:01, Robert Haas wrote:
> On Mon, Mar 27, 2017 at 12:50 PM, Andres Freund  wrote:
>> Robert, Petr, either of you planning to fix this (as outlined elsewhere
>> in the thred)?
> 
> Oh, I didn't realize anybody was looking to me to fix this.  I sort of
> thought that it was fallout from the logical replication patch and
> that Petr or Peter would deal with it.  If that's not the case, I'm
> not totally unwilling to take a whack at it, but I don't have much
> personal enthusiasm for trying to figure out how to make dynamic
> loading on the postgres binary itself work everywhere, so if it falls
> to me to fix, it's likely to get a hard-coded check for some
> hard-coded name.
> 

It affects parallel workers same way, I'll write patch for HEAD soon,
9.6 probably with some delay. I'll expand the InternalBgWorkers thing
that was added with logical replication to handle this in similar
fashion how we do fmgrtab.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Andres Freund
On 2017-03-28 04:12:41 +0300, Stas Kelvich wrote:
> 
> > On 28 Mar 2017, at 00:25, Andres Freund  wrote:
> > 
> > Hi,
> > 
> > On 2017-03-28 00:19:29 +0300, Stas Kelvich wrote:
> >> Ok, here it is.
> > 
> > On a very quick skim, this doesn't seem to solve the issues around
> > deadlocks of prepared transactions vs. catalog tables.  What if the
> > prepared transaction contains something like LOCK pg_class; (there's a
> > lot more realistic examples)? Then decoding won't be able to continue,
> > until that transaction is committed / aborted?
> 
> But why is that deadlock? Seems as just lock.

If you actually need separate decoding of 2PC, then you want to wait for
the PREPARE to be replicated.  If that replication has to wait for the
to-be-replicated prepared transaction to commit prepared, and commit
prepare will only happen once replication happened...


> Is there any other scenarios where catalog readers are blocked except 
> explicit lock
> on catalog table? Alters on catalogs seems to be prohibited.

VACUUM FULL on catalog tables (but that can't happen in xact => 2pc)
CLUSTER on catalog tables (can happen in xact)
ALTER on tables modified in the same transaction (even of non catalog
tables!), because a lot of routines will do a heap_open() to get the
tupledesc etc.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Ashutosh Sharma
On Mar 28, 2017 00:00, "Andreas Seltenreich"  wrote:

Ashutosh Sharma writes:
>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*)
(&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> Thanks for reporting this problem. Could you please let me know on for
> how long did you run sqlsmith to get this crash.

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each.  Nodes are tiny consumer machines with
low-power 4-core sandy bridges.


Okay. Thanks for sharing this information. I tried running running sqlsmith
on a single node. I ran it for an hour but didn't find any crash. Let me
also try running multiple sqlsmith at a time...May be 5 like you are doing
on a single node.


> [2. reacquire_lock_hashkillitems_if_required.patch]

I'll test with your patch applied as soon as time permits and report
back.


Sure, Thanks a lot.

With Regards,
Ashutosh Sharma


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Stas Kelvich

> On 28 Mar 2017, at 00:25, Andres Freund  wrote:
> 
> Hi,
> 
> On 2017-03-28 00:19:29 +0300, Stas Kelvich wrote:
>> Ok, here it is.
> 
> On a very quick skim, this doesn't seem to solve the issues around
> deadlocks of prepared transactions vs. catalog tables.  What if the
> prepared transaction contains something like LOCK pg_class; (there's a
> lot more realistic examples)? Then decoding won't be able to continue,
> until that transaction is committed / aborted?

But why is that deadlock? Seems as just lock.

In case of prepared lock of pg_class decoding will wait until it committed and
then continue to decode. As well as anything in postgres that accesses pg_class,
including inability to connect to database and bricking database if you 
accidentally
disconnected before committing that tx (as you showed me some while ago :-).

IMO it is issue of being able to prepare such lock, than of decoding.

Is there any other scenarios where catalog readers are blocked except explicit 
lock
on catalog table? Alters on catalogs seems to be prohibited.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-03-27 Thread Tatsuo Ishii
> Please post an explanation for the delay and a schedule for the new
> patch.  If no patch or explanation is posted by 2017-03-17 AoE I will
> mark this submission "Returned with Feedback".

Depite the fact that Yugo has posted a new patch on 2017-03-17, this
item had been marked as "Returned with Feedback". I don't know why.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-27 Thread Ashutosh Sharma
On Mar 27, 2017 22:25, "Robert Haas"  wrote:

On Fri, Mar 24, 2017 at 3:49 AM, Amit Kapila 
wrote:
> On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma 
wrote:
>>>
>>> I think it would have been probably okay to use *int* for ntuples as
>>> that matches with what you are actually assigning in the function.
>>
>> okay, corrected it. Attached is newer version of patch.
>
> Thanks, this version looks good to me.

Committed.


Thank you.


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-27 Thread Amit Langote
Hi,

On 2017/03/28 0:13, Maksim Milyutin wrote:
> On 24.03.2017 03:54, Amit Langote wrote:
>>
>> And here is the updated patch.
>>
> 
> Perhaps you forgot my fix in the updated patch
> 
> diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
> index 3999e6e..36917c8 100644
> --- a/src/backend/catalog/heap.c
> +++ b/src/backend/catalog/heap.c
> @@ -1823,7 +1823,8 @@ heap_drop_with_catalog(Oid relid)
>   */
>  if (rel->rd_rel->relkind != RELKIND_VIEW &&
>  rel->rd_rel->relkind != RELKIND_COMPOSITE_TYPE &&
> -rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE)
> +rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE &&
> +rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
>  {
>  RelationDropStorage(rel);
>  }

Oops, my bad.  I will include it in the patch I'll send after addressing
Robert's comments.  Thanks again!

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Clean up Perl code according to perlcritic

2017-03-27 Thread Tom Lane
I wrote:
> Peter Eisentraut  writes:
>> Clean up Perl code according to perlcritic

> This seems to have broken the regression tests (specifically, dblink)
> on at least some of the Windows buildfarm critters.

I'm hardly a Perl expert, but it looks to me like the culprit is this
hunk in vcregress.pl:

@@ -521,8 +521,9 @@ sub fetchRegressOpts
# an unhandled variable reference.  Ignore anything that isn't 
an
# option starting with "--".
@opts = grep {
-   s/\Q$(top_builddir)\E/\"$topdir\"/;
-   $_ !~ /\$\(/ && $_ =~ /^--/
+   my $x = $_;
+   $x =~ s/\Q$(top_builddir)\E/\"$topdir\"/;
+   $x !~ /\$\(/ && $x =~ /^--/
} split(/\s+/, $1);
}
if ($m =~ /^\s*ENCODING\s*=\s*(\S+)/m)

The first line in that block is actually intending to modify the value
it's inspecting, and perlcritic's "improved" version carefully removes
the side-effect.

No doubt there are cleaner ways to do that (the comments in "man perlfunc"
about this coding technique are not positive), but this way is not
cleaner, it is broken.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in get_partition_for_tuple

2017-03-27 Thread Amit Langote
On 2017/03/27 23:54, Robert Haas wrote:
> On Wed, Mar 22, 2017 at 4:00 AM, Amit Langote
>  wrote:
>>> You're right, fixed that in the attached updated patch.
>>
>> Added this to the partitioning open items list, to avoid being forgotten
>> about.
>>
>> https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Partitioning
> 
> Thanks for adding it there.  I'm making a sweep through that list now,
> or at least the parts of it that obvious pertain to my commits.  This
> patch looks good, so committed.

Thanks.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-27 Thread Amit Langote
On 2017/03/27 23:40, Robert Haas wrote:
> On Thu, Mar 9, 2017 at 7:20 PM, Amit Langote
>  wrote:
>> On 2017/03/10 9:10, Amit Langote wrote:
>>> On 2017/03/09 23:25, Robert Haas wrote:
 On Fri, Feb 17, 2017 at 1:47 AM, Amit Langote wrote:
> I updated the patch.  Now it's reduced to simply removing the check in
> transformInsertStmt() that prevented using *any* ON CONFLICT on
> partitioned tables at all.

 This patch no longer applies.
>>>
>>> Rebased patch is attached.
>>
>> Oops, really attached this time,
> 
> Committed with a bit of wordsmithing of the documentation.

Thanks.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Craig Ringer
On 28 March 2017 at 08:50, Stas Kelvich  wrote:
>
>> On 28 Mar 2017, at 00:19, Stas Kelvich  wrote:
>>
>> * It is actually doesn’t pass one of mine regression tests. I’ve added 
>> expected output
>> as it should be. I’ll try to send follow up message with fix, but right now 
>> sending it
>> as is, as you asked.
>>
>>
>
> Fixed. I forgot to postpone ReorderBufferTxn cleanup in case of prepare.
>
> So it pass provided regression tests right now.
>
> I’ll give it more testing tomorrow and going to write TAP test to check 
> behaviour
> when we loose info whether prepare was sent to subscriber or not.

Great, thanks. I'll try to have some TAP tests ready.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Craig Ringer
On 28 March 2017 at 05:25, Andres Freund  wrote:

> On a very quick skim, this doesn't seem to solve the issues around
> deadlocks of prepared transactions vs. catalog tables.  What if the
> prepared transaction contains something like LOCK pg_class; (there's a
> lot more realistic examples)? Then decoding won't be able to continue,
> until that transaction is committed / aborted?

Yeah, that's a problem and one we discussed in the past, though I lost
track of it in amongst the recent work.

I'm currently writing a few TAP tests intended to check this sort of
thing, mixed DDL/DML, overlapping xacts, interleaved prepared xacts,
etc. If they highlight problems they'll be useful for the next
iteration of this patch anyway.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Stas Kelvich

> On 28 Mar 2017, at 00:19, Stas Kelvich  wrote:
> 
> * It is actually doesn’t pass one of mine regression tests. I’ve added 
> expected output
> as it should be. I’ll try to send follow up message with fix, but right now 
> sending it
> as is, as you asked.
> 
> 

Fixed. I forgot to postpone ReorderBufferTxn cleanup in case of prepare.

So it pass provided regression tests right now.

I’ll give it more testing tomorrow and going to write TAP test to check 
behaviour
when we loose info whether prepare was sent to subscriber or not.




logical_twophase.diff
Description: Binary data



Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-27 Thread Lukas Fittl
On Mon, Mar 27, 2017 at 8:24 PM, Tom Lane  wrote:
>
> Pushed with minor adjustments.
>

Excellent - thanks for your review and all the discussion here!


> The main non-cosmetic thing I did was to replace the floor(log10())
> business with plain constant "10" as I suggested before.  That's
> what we do in other places --- see int4out for an example --- and
> frankly I did not feel that a small space savings in a transient
> string buffer was worth the intellectual effort to verify whether
> that calculation was correct or not, never mind whatever runtime
> cycles it would take.  I don't believe the argument that it's safer
> your way: if you had an off-by-one thinko in the calculation, or even
> just roundoff error in the log10() call, it could result in an actual
> reachable buffer overrun, because there's no safety margin.
>

Makes sense, guess I was overthinking this one.

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630 <(415)%20321-0630>


Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Michael Paquier
On Tue, Mar 28, 2017 at 8:38 AM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
>> Do you think that this qualifies as a bug fix for a backpatch? I would think
>> so, but I would not mind waiting for some dust to be on it before considering
>> applying that on back-branches. Thoughts from others?
>
> I think so, too.  As change from rename() to durable_rename() was applied to 
> all releases, maybe we can follow that.  In addition, keeping code 
> differences as few as possible would make it easier for committers to apply 
> other bug fixes to all versions.

The patch of HEAD applies cleanly on REL9_6_STABLE, further down it
needs some work but not much either. I am fine to produce those
patches should there be any need to.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Failed with build PostgreSQL in Windows ("--with-perl" option)

2017-03-27 Thread Huong Dangminh
Hi,

We have trying to build PostgreSQL in Windows Environment.

When build with options "--with-perl" (by setting config.pl file),
Builder not found "perl\d+.lib" file in "C:\Perl\lib\CORE\" folder,
so it ended with the messages bellow.

  "could not identify perl library version:  at src/tools/msvc/Mkvcbuild.pm 
line 590"

Manual said that "ActiveState Perl version 5.8 or later is required".
https://www.postgresql.org/docs/9.6/static/install-windows-full.html#AEN30798

But in version 5.22 or 5.24 (ActiveState Perl's currently community version),
we not found the above lib file.

What version of ActiveState Perl is suitable for PostgreSQL builder?
Or is there another solutions for "--with-perl" building in Windows?


Thanks and best regards,
---
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Use $ parameters as replacement characters for pg_stat_statements

2017-03-27 Thread Tom Lane
Lukas Fittl  writes:
> On Sat, Mar 18, 2017 at 12:46 PM, Tom Lane  wrote:
>> So it turns out this discussion just reinvented the alternative that
>> Lukas had in his 0002 proposal.  Are there any remaining objections
>> to proceeding with that approach?

> Thanks for reviewing - updated patch attached, comments below.

Pushed with minor adjustments.

The main non-cosmetic thing I did was to replace the floor(log10())
business with plain constant "10" as I suggested before.  That's
what we do in other places --- see int4out for an example --- and
frankly I did not feel that a small space savings in a transient
string buffer was worth the intellectual effort to verify whether
that calculation was correct or not, never mind whatever runtime
cycles it would take.  I don't believe the argument that it's safer
your way: if you had an off-by-one thinko in the calculation, or even
just roundoff error in the log10() call, it could result in an actual
reachable buffer overrun, because there's no safety margin.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Michael Paquier
> Do you think that this qualifies as a bug fix for a backpatch? I would think
> so, but I would not mind waiting for some dust to be on it before considering
> applying that on back-branches. Thoughts from others?

I think so, too.  As change from rename() to durable_rename() was applied to 
all releases, maybe we can follow that.  In addition, keeping code differences 
as few as possible would make it easier for committers to apply other bug fixes 
to all versions.

Regards
Takayuki Tsunakawa



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Michael Paquier
On Tue, Mar 28, 2017 at 1:34 AM, Teodor Sigaev  wrote:
> Thank you, pushed.

Thanks!

Do you think that this qualifies as a bug fix for a backpatch? I would
think so, but I would not mind waiting for some dust to be on it
before considering applying that on back-branches. Thoughts from
others?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-27 Thread Alexey Kondratov
Pavel, Craig and Stas,

Thank you for your responses and valuable comments!

I have written draft proposal 
https://docs.google.com/document/d/1Y4mc_PCvRTjLsae-_fhevYfepv4sxaqwhOo4rlxvK1c/edit
 


It seems that COPY currently is able to return first error line and error type 
(extra or missing columns, type parse error, etc).
Thus, the approach similar to the Stas wrote should work and, being optimised 
for a small number of error rows, should not 
affect COPY performance in such case.

I will be glad to receive any critical remarks and suggestions.


Alexey


> On 23 Mar 2017, at 17:24, Stas Kelvich  wrote:
> 
>> 
>> On 23 Mar 2017, at 15:53, Craig Ringer  wrote:
>> 
>> On 23 March 2017 at 19:33, Alexey Kondratov  
>> wrote:
>> 
>>> (1) Add errors handling to COPY as a minimum program
>> 
>> Huge +1 if you can do it in an efficient way.
>> 
>> I think the main barrier to doing so is that the naïve approach
>> creates a subtransaction for every row, which is pretty dire in
>> performance terms and burns transaction IDs very rapidly.
>> 
>> Most of our datatype I/O functions, etc, have no facility for being
>> invoked in a mode where they fail nicely and clean up after
>> themselves. We rely on unwinding the subtransaction's memory context
>> for error handling, for releasing any LWLocks that were taken, etc.
>> There's no try_timestamptz_in function or anything, just
>> timestamptz_in, and it ERROR's if it doesn't like its input. You
>> cannot safely PG_TRY / PG_CATCH such an exception and continue
>> processing to, say, write another row.
>> 
>> Currently we also don't have a way to differentiate between
>> 
>> * "this row is structurally invalid" (wrong number of columns, etc)
>> * "this row is structually valid but has fields we could not parse
>> into their data types"
>> * "this row looks structurally valid and has data types we could
>> parse, but does not satisfy a constraint on the destination table"
>> 
>> Nor do we have a way to write to any kind of failure-log table in the
>> database, since a simple approach relies on aborting subtransactions
>> to clean up failed inserts so it can't write anything for failed rows.
>> Not without starting a 2nd subxact to record the failure, anyway.
> 
> If we are optimising COPY for case with small amount of bad rows
> than 2nd subtransaction seems as not a bad idea. We can try to
> apply batch in subtx, if it fails on some row N then insert rows [1, N)
> in next subtx, report an error, commit subtx. Row N+1 can be treated
> as beginning of next batch.
> 
> 
> But if there will be some problems with handling everything with
> subtransaction and since parallelism is anyway mentioned, what about
> starting bgworker that will perform data insertion and will be controlled
> by backend?
> 
> For example backend can do following:
> 
> * Start bgworker (or even parallel worker) 
> * Get chunk of rows out of the file and try to apply them in batch
> as subtransaction in bgworker.
> * If it fails than we can open transaction in backend itself and
> raise notice / move failed rows to special errors table.
> 
>> So, having said why it's hard, I don't really have much for you in
>> terms of suggestions for ways forward. User-defined data types,
>> user-defined constraints and triggers, etc mean anything involving
>> significant interface changes will be a hard sell, especially in
>> something pretty performance-sensitive like COPY.
>> 
>> I guess it'd be worth setting out your goals first. Do you want to
>> handle all the kinds of problems above? Malformed  rows, rows with
>> malformed field values, and rows that fail to satisfy a constraint? or
>> just some subset?
>> 
>> 
>> 
>> -- 
>> Craig Ringer   http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Training & Services
>> 
>> 
>> -- 
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org 
>> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers 
>> 


Re: [HACKERS] [COMMITTERS] pgsql: Clean up Perl code according to perlcritic

2017-03-27 Thread Tom Lane
Peter Eisentraut  writes:
> Clean up Perl code according to perlcritic

This seems to have broken the regression tests (specifically, dblink)
on at least some of the Windows buildfarm critters.  It looks like
something got changed in the behavior of build-tree path expansion:
this in the input/paths.source file

CREATE FUNCTION putenv(text)
   RETURNS void
   AS '@libdir@/regress@DLSUFFIX@', 'regress_putenv'
   LANGUAGE C STRICT;

results in

  CREATE FUNCTION putenv(text)
 RETURNS void
 RETURNS void
 AS 
'C:/buildfarm/HEAD/pgsql.build/contrib/dblink/$(top_builddir)/src/test/regress/regress.dll',
 'regress_putenv'
 LANGUAGE C STRICT;
+ ERROR:  could not access file 
"C:/buildfarm/HEAD/pgsql.build/contrib/dblink/$(top_builddir)/src/test/regress/regress.dll":
 No such file or directory

which is surely not what it ought to be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-27 Thread Fabien COELHO



0001+0002 patch primarily for ease of review. will be following with a
single v28 patch shortly.


Applies cleanly. Make check ok. I think it behaves as committers required 
last. Let us try again with them...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Andres Freund
Hi,

On 2017-03-28 00:19:29 +0300, Stas Kelvich wrote:
> Ok, here it is.

On a very quick skim, this doesn't seem to solve the issues around
deadlocks of prepared transactions vs. catalog tables.  What if the
prepared transaction contains something like LOCK pg_class; (there's a
lot more realistic examples)? Then decoding won't be able to continue,
until that transaction is committed / aborted?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding of two-phase transactions

2017-03-27 Thread Stas Kelvich

> On 27 Mar 2017, at 16:29, Craig Ringer  wrote:
> 
> On 27 March 2017 at 17:53, Stas Kelvich  wrote:
> 
>> I’m heavily underestimated amount of changes there, but almost finished
>> and will send updated patch in several hours.
> 
> Oh, brilliant! Please post whatever you have before you knock off for
> the day anyway, even if it's just a WIP, so I can pick it up tomorrow
> my time and poke at its tests etc.
> 

Ok, here it is.

Major differences comparing to previous version:

* GID is stored to commit/abort records only when wal_level >= logical.

* More consistency about storing and parsing origin info. Now it
is stored in prepare and abort records when repsession is active.

* Some clenup, function renames to get rid of xact_even/gid fields
in ReorderBuffer which i used only to copy them ReorderBufferTXN.

* Changed output plugin interface to one that was suggested upthread.
Now prepare/CP/AP is separate callback, and if none of them is set
then 2pc tx will be decoded as 1pc to provide back-compatibility.

* New callback filter_prepare() that can be used to switch between
1pc/2pc style of decoding 2pc tx.

* test_decoding uses new API and filters out aborted and running prepared tx.
It is actually easy to move unlock of 2PCState there to prepare callback to 
allow
decode of running tx, but since that extension is example ISTM that is better 
not to
hold that lock there during whole prepare decoding. However I leaved
enough information there about this and about case when that locks are not need 
at all
(when we are coordinating this tx).
  Talking about locking of running prepared tx during decode, I think better 
solution
would be to use own custom lock here and register XACT_EVENT_PRE_ABORT
callback in extension to conflict with this lock. Decode should hold it in 
shared way,
while commit in excluseve. That will allow to lock stuff granularly ang block 
only
tx that is being decoded.
  However we don’t have XACT_EVENT_PRE_ABORT, but it is several LOCs to
add it. Should I?

* It is actually doesn’t pass one of mine regression tests. I’ve added expected 
output
as it should be. I’ll try to send follow up message with fix, but right now 
sending it
as is, as you asked.




logical_twophase.diff
Description: Binary data




Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] O(1) DSM handle operations

2017-03-27 Thread Thomas Munro
Hi hackers,

This is just a thought for discussion, no patch attached...

DSM operations dsm_create(), dsm_attach(), dsm_unpin_segment() perform
linear searches of the dsm_control->item array for either a free slot
or a slot matching a given handle.  Maybe no one thinks this is a
problem, because in practice the number of DSM slots you need to scan
should be something like number of backends * some small factor at
peak.

On the other hand it makes DSM slots a scarce resource and creates an
incentive for us never to allow many segments to be created, which has
knock-on implications, like dsa.c's geometric growth pattern: it
starts allocating bigger and bigger DSM segments so that we don't run
out of DSM slots which increases fragmentation (ie allocation of
memory you don't need).

Would it make sense to implement $SUBJECT like this?

1.  Perform O(1) handle lookup by using N bits of dsm_handle to
identify a dsm_control->item slot number, and detect ABA slot reuse by
using the rest of the bits of dsm_handle to hold a counter which must
match a cycling counter in the control slot.  This would replace the
current scheme where handles are random numbers.

2.  Perform O(1) slot allocation with a freelist using a
next-free-slot link pointer in the slots themselves.

Perhaps the freelist from (2) should be a FIFO queue so that slots are
reused as slowly as possible, strengthening the counter scheme from
(1), or perhaps that's optimising for the wrong thing, I don't know.

The biggest problem with point (1) above would seem to be the SysV
implementation, where dsm_handle is transmogrified into key_t and some
of the bits might get chomped if the latter is smaller, which seems to
imply that at least the slot identification bits would need to be sure
to fit into key_t, and also the use of the special key_t sentinel
values IPC_PRIVATE which somehow needs to be avoided.  Do any systems
really have sizeof(key_t) < 4?

Point (2) seems to be implementable independently of (1).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Robert Haas
On Thu, Mar 23, 2017 at 2:47 PM, Pavan Deolasee
 wrote:
> It's quite hard to say that until we see many more benchmarks. As author of
> the patch, I might have got repetitive with my benchmarks. But I've seen
> over 50% improvement in TPS even without chain conversion (6 indexes on a 12
> column table test).

This seems quite mystifying.  What can account for such a large
performance difference in such a pessimal scenario?  It seems to me
that without chain conversion, WARM can only apply to each row once
and therefore no sustained performance improvement should be possible
-- unless rows are regularly being moved to new blocks, in which case
those updates would "reset" the ability to again perform an update.
However, one would hope that most updates get done within a single
block, so that the row-moves-to-new-block case wouldn't happen very
often.

I'm perplexed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-27 Thread Alvaro Herrera
Is the WARM tap test suite supposed to work when applied without all the
other patches?  I just tried applied that one and running "make check -C
src/test/modules", and it seems to hang after giving "ok 5" for
t/002_warm_stress.pl.  (I had to add a Makefile too, attached.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
# src/test/modules/indexing/Makefile

MODULE = indexing
PGFILEDESC = "indexing - Test indexing interfaces"

ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = src/test/modules/indexing
top_builddir = ../../../..
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif

check: prove-check

prove-check:
$(prove_check)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-27 Thread Corey Huinker
>
>
> 0001+0002 patch primarily for ease of review. will be following with a
> single v28 patch shortly.
>


0001-psql-if-v28.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-27 Thread Corey Huinker
On Mon, Mar 27, 2017 at 3:25 PM, Fabien COELHO  wrote:

>
> And here you go
>>
>
> Patch applies cleany, make check ok. Looks pretty good.
>
> A minor detail I have just noticed, sorry: now that options are discarded
> by functions, some string variable declarations should be moved back inside
> the active branch. You moved them out because you where sharing the
> variables between the active & inactive branches, but this is no longer
> necessary, and the project practice seems to declare variables just where
> they are needed. That would be pattern in d, encoding in encoding, fname in
> f and g and include and out and s, prefix in gset, opt in help, opt* in lo
> and pset and set, arg* in prompt, env* in setenv... and maybe a few others.
>
> --
> Fabien.
>

done:
encoding f g gset help include lo out prompt pset s set setenv sf sv t T
timing unset watch x z ! ?

weird cases where they're both still needed:
d write

0001+0002 patch primarily for ease of review. will be following with a
single v28 patch shortly.


0001-psql-if-v27.patch
Description: Binary data


0002-fix-var-scoping.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] increasing the default WAL segment size

2017-03-27 Thread Beena Emerson
Hello,

On Sat, Mar 25, 2017 at 10:32 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> At this point, I suggest splitting this patch up into several
> potentially less controversial pieces.
>
> One big piece is that we currently don't support segment sizes larger
> than 64 GB, for various internal arithmetic reasons.  Your patch appears
> to address that.  So I suggest isolating that.  Assuming it works
> correctly, I think there would be no great concern about it.
>
> The next piece would be making the various tools aware of varying
> segment sizes without having to rely on a built-in value.
>
> The third piece would then be the rest that allows you to set the size
> at initdb
>
> If we take these in order, we would make it easier to test various sizes
> and see if there are any more unforeseen issues when changing sizes.  It
> would also make it easier to do performance testing so we can address
> the original question of what the default size should be.
>

PFA the patches divided into 3 parts:

02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes
the internal representation of max_wal_size and min_wal_size to mb.

03-modify-tools.patch - Makes XLogSegSize into a variable, currently set as
XLOG_SEG_SIZE and modifies the tools to fetch the size instead of using
inbuilt value.

04-initdb-walsegsize.patch - Adds the initdb option to set wal-segsize and
make related changes. Update pg_test_fsync to use DEFAULT_XLOG_SEG_SIZE
instead of XLOG_SEG_SIZE


> One concern I have is that your patch does not contain any tests.  There
> should probably be lots of tests.


05-initdb_tests.patch adds tap tests to initialize cluster with different
wal_segment_size and then check the config values. What other tests do you
have in mind? Checking the various tools?

--
Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


01-add-XLogSegmentOffset-macro.patch
Description: Binary data


02-increase-max-wal-segsize.patch
Description: Binary data


03-modify-tools.patch
Description: Binary data


05-initdb_tests.patch
Description: Binary data


04-initdb-walsegsize.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)

2017-03-27 Thread Fabien COELHO



And here you go


Patch applies cleany, make check ok. Looks pretty good.

A minor detail I have just noticed, sorry: now that options are discarded 
by functions, some string variable declarations should be moved back 
inside the active branch. You moved them out because you where sharing the 
variables between the active & inactive branches, but this is no longer 
necessary, and the project practice seems to declare variables just where 
they are needed. That would be pattern in d, encoding in encoding, fname 
in f and g and include and out and s, prefix in gset, opt in help, opt* in 
lo and pset and set, arg* in prompt, env* in setenv... and maybe a few 
others.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-27 Thread Andreas Seltenreich
Ashutosh Sharma writes:
>> TRAP: FailedAssertion("!(LWLockHeldByMe(((LWLock*) 
>> (&(bufHdr)->content_lock", File: "bufmgr.c", Line: 3397)
> Thanks for reporting this problem. Could you please let me know on for
> how long did you run sqlsmith to get this crash.

I got about one TRAP per hour when testing on 20 nodes with one postgres
and 5 sqlsmithes on each.  Nodes are tiny consumer machines with
low-power 4-core sandy bridges.

> [2. reacquire_lock_hashkillitems_if_required.patch]

I'll test with your patch applied as soon as time permits and report
back.

regards,
Andreas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-27 Thread Jesper Pedersen

Hi Mithun,

On 03/26/2017 01:56 AM, Mithun Cy wrote:

Thanks, Amit for the review.
On Sat, Mar 25, 2017 at 7:03 PM, Amit Kapila  wrote:


I think one-dimensional patch has fewer places to touch, so that looks
better to me.  However, I think there is still hard coding and
assumptions in code which we should try to improve.


Great!, I will continue with spares 1-dimensional improvement.



I ran some performance scenarios on the patch to see if the increased 
'spares' allocation had an impact. I havn't found any regressions in 
that regard.


Attached patch contains some small fixes, mainly to the documentation - 
on top of v7.


Best regards,
 Jesper

>From 5545e48ab7136f17b3d471e0ee679a6db6040865 Mon Sep 17 00:00:00 2001
From: jesperpedersen 
Date: Mon, 27 Mar 2017 14:15:00 -0400
Subject: [PATCH] Small fixes

---
 src/backend/access/hash/README | 50 +++---
 src/backend/access/hash/hashpage.c | 26 ++--
 src/backend/access/hash/hashutil.c | 24 +-
 3 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/src/backend/access/hash/README b/src/backend/access/hash/README
index 6721ee1..ca46de7 100644
--- a/src/backend/access/hash/README
+++ b/src/backend/access/hash/README
@@ -58,50 +58,50 @@ rules to support a variable number of overflow pages while not having to
 move primary bucket pages around after they are created.
 
 Primary bucket pages (henceforth just "bucket pages") are allocated in
-power-of-2 groups, called "split points" in the code.  That means on every new
-split point we double the existing number of buckets.  And, it seems bad to
-allocate huge chunks of bucket pages all at once and we take ages to consume it.
-To avoid this exponential growth of index size, we did a trick to breakup
-allocation of buckets at splitpoint into 4 equal phases.  If 2^x is the total
-buckets need to be allocated at a splitpoint (from now on we shall call this
-as splitpoint group), then we allocate 1/4th (2^(x - 2)) of total buckets at
-each phase of splitpoint group. Next quarter of allocation will only happen if
+power-of-2 groups, called "split points" in the code.  That means at every new
+split point we double the existing number of buckets.  Allocating huge chucks
+of bucket pages all at once isn't optimal and we will take ages to consume those.
+To avoid this exponential growth of index size, we did use a trick to breakup
+allocation of buckets at the split points into 4 equal phases.  If 2^x is the total
+buckets needed to be allocated at a split point (from now on we shall call this
+a split point group), then we allocate 1/4th (2^(x - 2)) of total buckets at
+each phase of the split point group. Next quarter of allocation will only happen if
 buckets of previous phase has been already consumed.  Since for buckets number
-< 4 we cannot further divide it in to multiple phases, the first splitpoint
+< 4 we cannot further divide it in to multiple phases, the first split point
 group 0's allocation is done as follows {1, 1, 1, 1} = 4 buckets in total, the
 numbers in curly braces indicate number of buckets allocated within each phase
-of splitpoint group 0.  In next splitpoint group 1 the allocation phases will
-be as follow {1, 1, 1, 1} = 8 buckets in total.  And, for splitpoint group 2
+of split point group 0.  In next split point group 1 the allocation phases will
+be as follow {1, 1, 1, 1} = 8 buckets in total.  And, for split point group 2
 and 3 allocation phase will be {2, 2, 2, 2} = 16 buckets in total and
-{4, 4, 4, 4} = 32 buckets in total.  We can see that at each splitpoint group
-we double the total number of buckets from previous group but in a incremental
-phase.  The bucket pages allocated within one phase of a splitpoint group will
+{4, 4, 4, 4} = 32 buckets in total.  We can see that at each split point group
+we double the total number of buckets from previous group but in an incremental
+phase.  The bucket pages allocated within one phase of a split point group will
 appear consecutively in the index.  This addressing scheme allows the physical
 location of a bucket page to be computed from the bucket number relatively
 easily, using only a small amount of control information.  If we look at the
-function _hash_spareindex for a given bucket number we first compute splitpoint
-group it belongs to and then the phase with in it to which the bucket belongs
-to.  Adding them we get the global splitpoint phase number S to which the
+function _hash_spareindex for a given bucket number we first compute the split point
+group it belongs to and then the phase to which the bucket belongs
+to.  Adding them we get the global split point phase number S to which the
 bucket belongs and then simply add "hashm_spares[S] + 1" (where hashm_spares[] is
 an array stored in the metapage) with given bucket number to compute its
 physical address.  The hashm_spares[S] can be interpreted 

Re: [HACKERS] Unused argument in PinBuffer function

2017-03-27 Thread Alexander Korotkov
On Thu, Aug 4, 2016 at 2:28 PM, Ildar Musin  wrote:

> I was looking through the buffer manager's code and have noticed that
> function PinBuffer has an unused 'strategy' argument. It's seems that after
> refactoring made by Alexander Korotkov and Andres Freund
> (48354581a49c30f5757c203415aa8412d85b0f70) it became useless. The
> attached patch removes it. Probably someone more advanced could edit the
> function description to reflect changes?
>

48354581 contain unintentional change that made 'strategy' argument of
PinBuffer unused.
Since it was an unintentional change, it was decided to put old logic
back.  See 9cf60332.
Thus, now 'strategy' argument of PinBuffer is used.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Tom Lane
Robert Haas  writes:
> ... I don't have much
> personal enthusiasm for trying to figure out how to make dynamic
> loading on the postgres binary itself work everywhere, so if it falls
> to me to fix, it's likely to get a hard-coded check for some
> hard-coded name.

+1.  This seems like no time to be buying into brand new portability
requirements without a very pressing need to do so; and this patch
doesn't appear to create one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki

2017-03-27 Thread Jan Michálek
2017-03-23 17:26 GMT+01:00 Pierre Ducroquet :

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
>
> Hi
>
> This is my first review (Magnus said in his presentation in PGDay Paris
> that volunteers should just come and help, so here I am), so please notify
> me for any mistake I do when using the review tools...
>
> The feature seems to work as expected, but I don't claim to be a markdown
> and rst expert.
> Some minor issues with the code itself :
> - some indentation issues (documentation and code itself with mix between
> space based and tab based indentation) and a few trailing spaces in code
>

corrected


> - typographic issues in the documentation :
>   - "The html, asciidoc, latex, latex-longtable, troff-ms, and markdown
> and rst formats" ==> duplicated and
>

corrected

>   - "Sets the output format to one of unaligned, aligned, wrapped, html,
> asciidoc, latex (uses tabular), latex-longtable, rst, markdown, or
> troff-ms." ==> extra comma at the end of the list
> - the comment " dont add line after last row, because line is added after
> every row" is misleading, it should warn that it's only for rst
> - there is a block of commented out code left
> - in the print_aligned_vertical function, there is a mix between
> "cont->opt->format == PRINT_RST" and "format == _rst" and I don't see
> any obvious reason for that
>
corrected

> - the documentation doesn't mention (but ok, it's kind of obvious) that
> the linestyle option will not work with rst and markdown
>
>
In this patch are corrected (i hope, i had correct changes in vimrc)
indentation issues. Plese, look at this if it is OK (i men indentats) and
some minor errors. And it should work on current master (probably).

Have nice day

Jan


> Thanks !
>
> The new status of this patch is: Waiting on Author
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Jelen
Starší čeledín datovýho chlíva
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 2a9c412020..ee0c8fbea6 100644
*** a/doc/src/sgml/ref/psql-ref.sgml
--- b/doc/src/sgml/ref/psql-ref.sgml
***
*** 2366,2372  lo_import 152801
aligned, wrapped,
html, asciidoc,
latex (uses tabular),
!   latex-longtable, or
troff-ms.
Unique abbreviations are allowed.  (That would mean one letter
is enough.)
--- 2366,2373 
aligned, wrapped,
html, asciidoc,
latex (uses tabular),
! 		  latex-longtable, 
! 		  rst, markdown, or
troff-ms.
Unique abbreviations are allowed.  (That would mean one letter
is enough.)
***
*** 2394,2400  lo_import 152801
  

The html, asciidoc, latex,
!   latex-longtable, and troff-ms
formats put out tables that are intended to
be included in documents using the respective mark-up
language. They are not complete documents! This might not be
--- 2395,2402 
  

The html, asciidoc, latex,
! 		  latex-longtable, troff-ms,
! 		  markdown and rst
formats put out tables that are intended to
be included in documents using the respective mark-up
language. They are not complete documents! This might not be
diff --git a/src/bin/psql/command.c bindex 4f4a0aa9bd..97f4f37923 100644
*** a/src/bin/psql/command.c
--- b/src/bin/psql/command.c
***
*** 2518,2523  _align2string(enum printFormat in)
--- 2518,2529 
  		case PRINT_TROFF_MS:
  			return "troff-ms";
  			break;
+ 		case PRINT_MARKDOWN:
+ 			return "markdown";
+ 			break;
+ 		case PRINT_RST:
+ 			return "rst";
+ 			break;
  	}
  	return "unknown";
  }
***
*** 2589,2597  do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
  			popt->topt.format = PRINT_LATEX_LONGTABLE;
  		else if (pg_strncasecmp("troff-ms", value, vallen) == 0)
  			popt->topt.format = PRINT_TROFF_MS;
  		else
  		{
! 			psql_error("\\pset: allowed formats are unaligned, aligned, wrapped, html, asciidoc, latex, latex-longtable, troff-ms\n");
  			return false;
  		}
  	}
--- 2595,2607 
  			popt->topt.format = PRINT_LATEX_LONGTABLE;
  		else if (pg_strncasecmp("troff-ms", value, vallen) == 0)
  			popt->topt.format = PRINT_TROFF_MS;
+ 		else if (pg_strncasecmp("markdown", value, vallen) == 0)
+ 			popt->topt.format = PRINT_MARKDOWN;
+ 		else if (pg_strncasecmp("rst", value, vallen) == 0)
+ 			popt->topt.format = PRINT_RST;
  		else
  		{
! 			psql_error("\\pset: allowed formats are unaligned, aligned, wrapped, html, asciidoc, latex, 

Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Andres Freund
On 2017-03-27 13:30:11 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-03-27 12:18:37 -0400, Tom Lane wrote:
> >> My feeling at this point is that we might be better off disabling
> >> the computed-goto case by default.  At the very least, we're going
> >> to need a version check that restricts it to latest gcc.
> 
> > In my measurements it's still faster in at least gcc-5/6, even without
> > the option (largely because it avoids array bounds checks on the jump
> > table built for the switch).
> 
> Hm.  What test cases are you using?

I used tpc-h - seems like a realistic enough testcase.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-27 12:18:37 -0400, Tom Lane wrote:
>> My feeling at this point is that we might be better off disabling
>> the computed-goto case by default.  At the very least, we're going
>> to need a version check that restricts it to latest gcc.

> In my measurements it's still faster in at least gcc-5/6, even without
> the option (largely because it avoids array bounds checks on the jump
> table built for the switch).

Hm.  What test cases are you using?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Andres Freund
On 2017-03-27 13:01:11 -0400, Robert Haas wrote:
> On Mon, Mar 27, 2017 at 12:50 PM, Andres Freund  wrote:
> > Robert, Petr, either of you planning to fix this (as outlined elsewhere
> > in the thred)?
> 
> Oh, I didn't realize anybody was looking to me to fix this.

Well, it's borked in 9.6.  I'm starting to get annoyed by culicidae's
failures ;)


> I sort of thought that it was fallout from the logical replication
> patch and that Petr or Peter would deal with it.  If that's not the
> case, I'm not totally unwilling to take a whack at it, but I don't
> have much personal enthusiasm for trying to figure out how to make
> dynamic loading on the postgres binary itself work everywhere, so if
> it falls to me to fix, it's likely to get a hard-coded check for some
> hard-coded name.

I'm all for that approach - there seems very little upside in the
dynamic loading approach.  Just defining a bgw_entry_points[enum
BuiltinBGWorkerType] -> bgworker_main_type array seems to be simple
enough - it's not like we're going to add new types of builtin bgworkers
at runtime.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 12:50 PM, Andres Freund  wrote:
> Robert, Petr, either of you planning to fix this (as outlined elsewhere
> in the thred)?

Oh, I didn't realize anybody was looking to me to fix this.  I sort of
thought that it was fallout from the logical replication patch and
that Petr or Peter would deal with it.  If that's not the case, I'm
not totally unwilling to take a whack at it, but I don't have much
personal enthusiasm for trying to figure out how to make dynamic
loading on the postgres binary itself work everywhere, so if it falls
to me to fix, it's likely to get a hard-coded check for some
hard-coded name.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-27 Thread Robert Haas
On Fri, Mar 24, 2017 at 3:49 AM, Amit Kapila  wrote:
> On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma  
> wrote:
>>>
>>> I think it would have been probably okay to use *int* for ntuples as
>>> that matches with what you are actually assigning in the function.
>>
>> okay, corrected it. Attached is newer version of patch.
>
> Thanks, this version looks good to me.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-27 Thread Andres Freund
On 2017-03-16 10:13:37 +0100, Petr Jelinek wrote:
> On 16/03/17 09:53, Andres Freund wrote:
> > On 2017-03-16 09:40:48 +0100, Petr Jelinek wrote:
> >> On 16/03/17 04:42, Andres Freund wrote:
> >>> On 2017-03-15 20:28:33 -0700, Andres Freund wrote:
>  Hi,
> 
>  I just unstuck a bunch of my buildfarm animals.  That triggered some
>  spurious failures (on piculet, calliphoridae, mylodon), but also one
>  that doesn't really look like that:
>  https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2017-03-16%2002%3A40%3A03
> 
>  with the pertinent point being:
> 
>  == stack trace: 
>  pgsql.build/src/test/regress/tmp_check/data/core ==
>  [New LWP 1894]
>  [Thread debugging using libthread_db enabled]
>  Using host libthread_db library 
>  "/lib/x86_64-linux-gnu/libthread_db.so.1".
>  Core was generated by `postgres: bgworker: logical replication launcher  
>    '.
>  Program terminated with signal SIGSEGV, Segmentation fault.
>  #0  0x55e265bff5e3 in ?? ()
>  #0  0x55e265bff5e3 in ?? ()
>  #1  0x55d3ccabed0d in StartBackgroundWorker () at 
>  /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/bgworker.c:792
>  #2  0x55d3ccacf4fc in SubPostmasterMain (argc=3, 
>  argv=0x55d3cdbb71c0) at 
>  /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4878
>  #3  0x55d3cca443ea in main (argc=3, argv=0x55d3cdbb71c0) at 
>  /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:205
> 
>  it's possible that me killing things and upgrading caused this, but
>  given this is a backend running EXEC_BACKEND, I'm a bit suspicous that
>  it's more than that.  The machine is a bit backed up at the moment, so
>  it'll probably be a while till it's at that animal/branch again,
>  otherwise I'd not have mentioned this.
> >>>
> >>> For some reason it ran again pretty soon. And I'm afraid it's indeed an
> >>> issue:
> >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2017-03-16%2003%3A30%3A02
> >>>
> >>
> >> Hmm, I tried with EXEC_BACKEND (and with --disable-spinlocks) and it
> >> seems to work fine on my two machines. I don't see anything else
> >> different on culicidae though. Sadly the backtrace is not that
> >> informative either. I'll try to investigate more but it will take time...
> > 
> > Worthwhile additional failure:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2017-03-16%2002%3A55%3A01
> > 
> > Same animal, also EXEC_BACKEND, but 9.6.
> > 
> > A quick look at the relevant line:
> > /*
> >  * If bgw_main is set, we use that value as the initial entrypoint.
> >  * However, if the library containing the entrypoint wasn't loaded at
> >  * postmaster startup time, passing it as a direct function pointer is 
> > not
> >  * possible.  To work around that, we allow callers for whom a function
> >  * pointer is not available to pass a library name (which will be 
> > loaded,
> >  * if necessary) and a function name (which will be looked up in the 
> > named
> >  * library).
> >  */
> > if (worker->bgw_main != NULL)
> > entrypt = worker->bgw_main;
> > 
> > makes the issue clear - we appear to be assuming that bgw_main is
> > meaningful across processes.  Which it isn't in the EXEC_BACKEND case
> > when ASLR is in use...
> > 
> > This kinda sounds familiar, but a quick google search doesn't find
> > anything relevant.

Robert, Petr, either of you planning to fix this (as outlined elsewhere
in the thred)?


> Hmm now that you mention it, I remember discussing something similar
> with you last year in Dallas in regards to parallel query. IIRC Windows
> should not have this problem but other systems with EXEC_BACKEND do.
> Don't remember the details though.

Don't think that's reliable, only works as long as the binary is
compiled without position independent code.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 12:36 PM, Rushabh Lathia
 wrote:
> Hmm I agree that it's good idea, and I will work on that as separate patch.

Maybe you want to start with what David already posted?

>> Possibly we
>> should fix the crash bug first, though, and then do that afterwards.
>> What bugs me a little about Rushabh's fix is that it looks like magic.
>> You have to know that we're looping over two things and freeing them
>> up, but there's one more of one thing than the other thing.  I think
>> that at least needs some comments or something.
>>
> So in my second version of patch I change  gather_merge_clear_slots() to
> just clear the slot for the worker and some other clean up. Also throwing
> NULL from gather_merge_getnext() when all the queues and heap are
> exhausted - which earlier gather_merge_clear_slots() was returning clear
> slot. This way we make sure that we don't run over freeing the slot for
> the leader and gather_merge_getnext() don't need to depend on that
> clear slot.

Ah, I missed that.  That does seem cleaner.  Anybody see a problem
with that approach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Andres Freund
On 2017-03-27 12:18:37 -0400, Tom Lane wrote:
> I wrote:
> > As to the point of whether it actually helps or not ...
> > on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> > the dispatches getting routed through a common location, to *all* of them
> > (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> > survives as a separate jump).  This seems like a bug, but there it is.
> 
> So after a bit of googling, this is a very longstanding complaint:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785
> 
> (hm, think I know the second submitter)

I don't think that's precisely the same issue - as long as some of the
goto branches survive, we're not hitting the full brunt of the compgoto
thing.  I think we're essentially avoiding some of that because we're
"precomputing" the dispatch_table lookup.

To count the number of jumps I used:
gdb  -batch -ex 'disassemble/s ExecInterpExpr' execExprInterp.o|grep jmpq|grep 
-v ExecInterpExpr|wc -l
which'll only include indirect jumps (since otherwise jumps will look
like "jmpq   0x35f2 ")

standard flags  -fno-crossjumping
gcc-5 (5.4.1-8):34  82
gcc-6 (6.3.0-8):34  82
gcc-7 (7.0.1):  71  108
gcc-snapshot:   72  108

So that doesn't look too bad.


> My feeling at this point is that we might be better off disabling
> the computed-goto case by default.  At the very least, we're going
> to need a version check that restricts it to latest gcc.

In my measurements it's still faster in at least gcc-5/6, even without
the option (largely because it avoids array bounds checks on the jump
table built for the switch).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Rushabh Lathia
On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas  wrote:

> On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
>  wrote:
> > On 28 March 2017 at 04:57, Robert Haas  wrote:
> >> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
> >>  wrote:
> >>> About the original issue reported by Tomas, I did more debugging and
> >>> found that - problem was gather_merge_clear_slots() was not returning
> >>> the clear slot when nreader is zero (means nworkers_launched = 0).
> >>> Due to the same scan was continue even all the tuple are exhausted,
> >>> and then end up with server crash at gather_merge_getnext(). In the
> patch
> >>> I also added the Assert into gather_merge_getnext(), about the index
> >>> should be less then the nreaders + 1 (leader).
> >>
> >> Well, you and David Rowley seem to disagree on what the fix is here.
> >> His patches posted upthread do A, and yours do B, and from a quick
> >> look those things are not just different ways of spelling the same
> >> underlying fix, but actually directly conflicting ideas about what the
> >> fix should be.  Any chance you can review his patches, and maybe he
> >> can review yours, and we could try to agree on a consensus position?
> >> :-)
> >
> > When comparing Rushabh's to my minimal patch, both change
> > gather_merge_clear_slots() to clear the leader's slot. My fix
> > mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> > leader's slot, but seems that's not required since
> > gather_merge_readnext() sets the leader's slot to the output of
> > ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> > mistake. Rushabh's patch sidesteps this by adding a conditional
> > pfree() to not free slot that we didn't allocate in the first place.
> >
> > I do think the code could be improved a bit. I don't like the way
> > GatherMergeState's nreaders and nworkers_launched are always the same.
> > I think this all threw me off a bit and may have been the reason for
> > the bug in the first place.
>
> Yeah, if those fields are always the same, then I think that they
> should be merged.  That seems undeniably a good idea.


Hmm I agree that it's good idea, and I will work on that as separate patch.


> Possibly we
> should fix the crash bug first, though, and then do that afterwards.
> What bugs me a little about Rushabh's fix is that it looks like magic.
> You have to know that we're looping over two things and freeing them
> up, but there's one more of one thing than the other thing.  I think
> that at least needs some comments or something.
>
>
So in my second version of patch I change  gather_merge_clear_slots() to
just clear the slot for the worker and some other clean up. Also throwing
NULL from gather_merge_getnext() when all the queues and heap are
exhausted - which earlier gather_merge_clear_slots() was returning clear
slot. This way we make sure that we don't run over freeing the slot for
the leader and gather_merge_getnext() don't need to depend on that
clear slot.


Thanks,
Rushabh Lathia
www.EnterpriseDB.com


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 12:26 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
>>> Since this has now come up twice, I suggest adding a comment there
>>> that explains why we're intentionally ignoring max_parallel_workers.
>
>> Good idea.  How about the attached?
>
> WFM ... but seems like there should be some flavor of this statement
> in the user-facing docs too (ie, "max_parallel_workers_per_gather >
> max_parallel_workers is a bad idea unless you're trying to test what
> happens when a plan can't get all the workers it planned for").  The
> existing text makes some vague allusions suggesting that the two
> GUCs might be interrelated, but I think it could be improved.

Do you have a more specific idea?  I mean, this seems like a
degenerate case of what the documentation for
max_parallel_workers_per_gather says already. Even if
max_parallel_workers_per_gather <= Min(max_worker_processes,
max_parallel_workers), it's quite possible that you'll regularly be
generating plans that can't obtain the budgeted number of workers.
The only thing that is really special about the case where
max_parallel_workers_per_gather > Min(max_worker_processes,
max_parallel_workers) is that this can happen even on an
otherwise-idle system.  I'm not quite sure how to emphasize that
without seeming to state the obvious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Teodor Sigaev

Thank you, pushed

Michael Paquier wrote:

On Fri, Mar 24, 2017 at 11:36 PM, Teodor Sigaev  wrote:

And the renaming of pg_clog to pg_xact is also my fault. Attached is
an updated patch.



Thank you. One more question: what about symlinks? If DBA moves, for
example, pg_xact to another dist and leaves the symlink in data directoty.
Suppose, fsync on symlink will do nothing actually.


I did not think of this case, but is that really common? There is even
no option to configure that at command level. And surely we cannot
support any fancy scenario that people figure out using PGDATA.
Existing callers of fsync_fname don't bother about this case as well
by the way, take the calls related to pg_logical and pg_repslot.

If something should be done in this area, that would be surely in
fsync_fname directly to centralize all the calls, and I would think of
that as a separate patch, and a separate discussion.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-27 Thread Ashutosh Sharma
>> >>
>> >> I think it would have been probably okay to use *int* for ntuples as
>> >> that matches with what you are actually assigning in the function.
>> >
>> > okay, corrected it. Attached is newer version of patch.
>> >
>>
>> Thanks, this version looks good to me.
>
>
> It solves the problem for me.

Great..Thanks for confirming.

I'd like to test that I get the right answer
> on the standby, not just the absence of a crash, but I don't know how to do
> that effectively.  Has anyone used the new wal replay block consistency tool
> on hash indexes since this microvacuum code was committed?

Yes, I have used it for hash index but I used it after below commit,

commit 9abbf4727de746222ad8fc15b17348065389ae43
Author: Robert Haas 
Date:   Mon Mar 20 15:55:27 2017 -0400

Another fix for single-page hash index vacuum.

The WAL consistency checking code needed to be updated for the new
page status bit, but that didn't get done previously.

All I did was set 'wal_consistency_checking = 'hash'' in
postgresql.conf and ran test cases on primary server. If there is any
inconsistent block on standby the tool would probably terminate the
recovery process and you would see following message in the server
logfile.

"inconsistent page found, rel %u/%u/%u, forknum %u, blkno %u"

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
>> Since this has now come up twice, I suggest adding a comment there
>> that explains why we're intentionally ignoring max_parallel_workers.

> Good idea.  How about the attached?

WFM ... but seems like there should be some flavor of this statement
in the user-facing docs too (ie, "max_parallel_workers_per_gather >
max_parallel_workers is a bad idea unless you're trying to test what
happens when a plan can't get all the workers it planned for").  The
existing text makes some vague allusions suggesting that the two
GUCs might be interrelated, but I think it could be improved.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Potential data loss of 2PC files

2017-03-27 Thread Teodor Sigaev

Thank you. One more question: what about symlinks? If DBA moves, for
example, pg_xact to another dist and leaves the symlink in data directoty.
Suppose, fsync on symlink will do nothing actually.


I did not think of this case, but is that really common? There is even

I saw a lot such cases.


If something should be done in this area, that would be surely in
fsync_fname directly to centralize all the calls, and I would think of
that as a separate patch, and a separate discussion.

Agree

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
 wrote:
> On 28 March 2017 at 04:57, Robert Haas  wrote:
>> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>>  wrote:
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>
>> Well, you and David Rowley seem to disagree on what the fix is here.
>> His patches posted upthread do A, and yours do B, and from a quick
>> look those things are not just different ways of spelling the same
>> underlying fix, but actually directly conflicting ideas about what the
>> fix should be.  Any chance you can review his patches, and maybe he
>> can review yours, and we could try to agree on a consensus position?
>> :-)
>
> When comparing Rushabh's to my minimal patch, both change
> gather_merge_clear_slots() to clear the leader's slot. My fix
> mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> leader's slot, but seems that's not required since
> gather_merge_readnext() sets the leader's slot to the output of
> ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> mistake. Rushabh's patch sidesteps this by adding a conditional
> pfree() to not free slot that we didn't allocate in the first place.
>
> I do think the code could be improved a bit. I don't like the way
> GatherMergeState's nreaders and nworkers_launched are always the same.
> I think this all threw me off a bit and may have been the reason for
> the bug in the first place.

Yeah, if those fields are always the same, then I think that they
should be merged.  That seems undeniably a good idea.  Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing.  I think
that at least needs some comments or something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Improve performance of find_all_inheritors()

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 12:08 PM, Teodor Sigaev  wrote:
> Improve performance of find_all_inheritors()
>
> Previous coding uses three nested loops which obviously were a pain for
> large number of table's children. Patch replaces inner loop with
> a hashmap.
>
> Author: Aleksander Alekseev
> Reviewed-by: me
>
> https://commitfest.postgresql.org/13/1058/

I'm a little worried that this will be noticeably slower when the
number of partitions is small, like 4 or 8 or 16.  In other places,
we've found it necessary to support both a list-based strategy and a
hash-based strategy to avoid regressing small cases (e.g.
join_rel_list + join_rel_hash in PlannerInfo, ResourceArray in
resowner.c, etc).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Andres Freund
On 2017-03-27 11:52:05 -0400, Tom Lane wrote:
> As to the point of whether it actually helps or not ...
> 
> on gcc 6.3.1 (Fedora 25), yes it does.  Seems to be one "jmp *something"
> per EEO_NEXT or EEO_JUMP.
> 
> on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> the dispatches getting routed through a common location, to *all* of them
> (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> survives as a separate jump).  This seems like a bug, but there it is.

Gah :(.  I have gcc 5,6,7 here, but nothing earlier.  I'm now inclined
to go the version check routine, for the individual versions we can (and
want) confirm this on...  I'm not too concerned about not doing so on
gcc 4.4 or older...


> So this means we'd need some serious research to decide whether to apply
> it.  And I'm suspecting we'd end up with a compiler version test.

Indeed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Tom Lane
I wrote:
> As to the point of whether it actually helps or not ...
> on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
> the dispatches getting routed through a common location, to *all* of them
> (except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
> survives as a separate jump).  This seems like a bug, but there it is.

So after a bit of googling, this is a very longstanding complaint:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71785

(hm, think I know the second submitter)

I'm not sure we should be relying on a gcc bug fix that's barely four
months old.  Even assuming it fixes the issue without new regressions,
most people are not going to have it anytime soon.

My feeling at this point is that we might be better off disabling
the computed-goto case by default.  At the very least, we're going
to need a version check that restricts it to latest gcc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-27 Thread Kevin Grittner
On Sat, Mar 25, 2017 at 9:24 PM, Mengxing Liu
 wrote:

> I've updated the proposal according to your comments.
> But I am still wondering that can you review it for a double-check
> to make sure I've made everything clear?

Additional comments added.

I'm afraid a few new issues came to mind reading it again.  (Nothing
serious; just some points that could benefit from a little
elaboration.)

--
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread David Rowley
On 28 March 2017 at 04:57, Robert Haas  wrote:
> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>  wrote:
>> About the original issue reported by Tomas, I did more debugging and
>> found that - problem was gather_merge_clear_slots() was not returning
>> the clear slot when nreader is zero (means nworkers_launched = 0).
>> Due to the same scan was continue even all the tuple are exhausted,
>> and then end up with server crash at gather_merge_getnext(). In the patch
>> I also added the Assert into gather_merge_getnext(), about the index
>> should be less then the nreaders + 1 (leader).
>
> Well, you and David Rowley seem to disagree on what the fix is here.
> His patches posted upthread do A, and yours do B, and from a quick
> look those things are not just different ways of spelling the same
> underlying fix, but actually directly conflicting ideas about what the
> fix should be.  Any chance you can review his patches, and maybe he
> can review yours, and we could try to agree on a consensus position?
> :-)

When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.

I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-27 Thread Mike Palmiotto
On Mon, Mar 27, 2017 at 11:46 AM, Robert Haas  wrote:
> On Mon, Mar 27, 2017 at 11:22 AM, Mike Palmiotto
>  wrote:
>> On Mon, Mar 27, 2017 at 10:47 AM, Robert Haas  wrote:
>>> On Thu, Mar 9, 2017 at 9:47 AM, Stephen Frost  wrote:
 While going over the contrib modules, I noticed that sepgsql was not
 updated for partitioned tables.  What that appears to mean is that it's
 not possible to define labels on partitioned tables.
>>>
>>> It works for me:
>>>
>>> rhaas=# load 'dummy_seclabel';
>>> LOAD
>>> rhaas=# create table foo (a int, b text) partition by range (a);
>>> CREATE TABLE
>>> rhaas=# security label on table foo is 'classified';
>>> SECURITY LABEL
>>>
>>> What exactly is the problem you're seeing?
>>
>> IIRC the initial concern was that contrib/sepgsql was not updated for
>> RELKIND_PARTITIONED_TABLE, so the post-create hook would not work
>> properly for partitioned tables.
>>
>> I've looked into it a bit and saw a post-alter hook in
>> StoreCatalogInheritance1. It seems like it may just be an issue of
>> adding the RELKIND_PARTITIONED_TABLE to sepgsql_relation_post_create.
>
> I thought that kind of thing might be the issue, but it didn't seem to
> match Stephen's description.  Adding RELKIND_PARTITIONED_TABLE to that
> function seems like it would probably be sufficient to make that hook
> work, although that would need to be tested, but there are numerous
> other references to RELKIND_RELATION in contrib/sepgsql, some of which
> probably also need to be updated to consider
> RELKIND_PARTITIONED_TABLE.

Agreed.

>
> In my view, this is really an enhancement to sepgsql to handle a new
> PostgreSQL feature rather than a defect in the partitioning commit, so
> I don't think it falls on Amit Langote (as the author) or me (as the
> committer) to fix.  There might similarly be updates to sepgsql to do
> something with permissions on logical replication's new publication
> and subscription objects, but we should similarly regard those as
> possible new features, not things that Petr or Peter need to work on.

I'll keep these items in mind for future reference. Thanks.

>  Note that sepgsql hasn't been updated to work with RLS yet, either,
> but we didn't regard that as an open item for RLS, or if we did the
> resolution was just to document it.  I am not opposed to giving a
> little more time to get this straightened out, but if a patch doesn't
> show up fairly soon then I think we should just document that sepgsql
> doesn't support partitioned tables in v10.  sepgsql has a fairly
> lengthy list of implementation restrictions already, so one more is
> not going to kill anybody -- or if it will then that person should
> produce a patch soon.

Okay, I'll make sure I get something fleshed out today or tomorrow.

-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Allow vacuums to report oldestxmin

2017-03-27 Thread Masahiko Sawada
On Sun, Mar 26, 2017 at 2:26 AM, Masahiko Sawada  wrote:
> On Sun, Mar 26, 2017 at 1:37 AM, Fujii Masao  wrote:
>> On Mon, Mar 6, 2017 at 9:37 PM, Masahiko Sawada  
>> wrote:
>>> On Fri, Mar 3, 2017 at 10:50 PM, Simon Riggs  wrote:
 Allow vacuums to report oldestxmin

 Allow VACUUM and Autovacuum to report the oldestxmin value they
 used while cleaning tables, helping to make better sense out of
 the other statistics we report in various cases.

 Branch
 --
 master

 Details
 ---
 http://git.postgresql.org/pg/commitdiff/9eb344faf54a849898d9be012ddfa8204cfeb57c

 Modified Files
 --
 src/backend/commands/vacuumlazy.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)


>>>
>>> Should we change the example in vacuum.sgml file as well? Attached patch.
>>
>> "tuples" in the above should be "row versions"?
>> We should review not only this line but also all the lines in the example
>> of VERBOSE output, I think.
>
> Right. These verbose log messages are out of date. I ran
> VACUUM(VERBOSE, ANALYZE) with same scenario as current example as
> possible. Attached patch updates verbose log messages.
>
>

Surprisingly the changes "tuples" -> "row versions" in vacuumlazy.c is
introduced by commit feb4f44d296b88b7f0723f4a4f3945a371276e0b in 2003.
If this patch is applied, it should back-patch to all supported
branches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication worker and statistics

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 11:14 AM, Fujii Masao  wrote:
> Logical replication worker should call pgstat_report_stat()?
> Currently it doesn't seem to do that and no statistics about
> table accesses by logical replication workers are collected.
> For example, this can prevent autovacuum from working on
> those tables properly.

Yeah, that doesn't sound good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
 wrote:
> About the original issue reported by Tomas, I did more debugging and
> found that - problem was gather_merge_clear_slots() was not returning
> the clear slot when nreader is zero (means nworkers_launched = 0).
> Due to the same scan was continue even all the tuple are exhausted,
> and then end up with server crash at gather_merge_getnext(). In the patch
> I also added the Assert into gather_merge_getnext(), about the index
> should be less then the nreaders + 1 (leader).

Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be.  Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in hot standby for hash indexes

2017-03-27 Thread Jeff Janes
On Fri, Mar 24, 2017 at 12:49 AM, Amit Kapila 
wrote:

> On Fri, Mar 24, 2017 at 12:25 PM, Ashutosh Sharma 
> wrote:
> >>
> >> I think it would have been probably okay to use *int* for ntuples as
> >> that matches with what you are actually assigning in the function.
> >
> > okay, corrected it. Attached is newer version of patch.
> >
>
> Thanks, this version looks good to me.
>

It solves the problem for me.  I'd like to test that I get the right answer
on the standby, not just the absence of a crash, but I don't know how to do
that effectively.  Has anyone used the new wal replay block consistency
tool on hash indexes since this microvacuum code was committed?

Jeff


Re: [HACKERS] WIP: Faster Expression Processing v4

2017-03-27 Thread Tom Lane
As to the point of whether it actually helps or not ...

on gcc 6.3.1 (Fedora 25), yes it does.  Seems to be one "jmp *something"
per EEO_NEXT or EEO_JUMP.

on gcc 4.4.7 (RHEL 6), it makes things *WORSE*.  We go from about half of
the dispatches getting routed through a common location, to *all* of them
(except one; for some odd reason the first EEO_NEXT in EEOP_NULLIF
survives as a separate jump).  This seems like a bug, but there it is.

So this means we'd need some serious research to decide whether to apply
it.  And I'm suspecting we'd end up with a compiler version test.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
>>  wrote:
>>> But it seems a bit futile to produce the parallel plan in the first place,
>>> because with max_parallel_workers=0 we can't possibly get any parallel
>>> workers ever. I wonder why compute_parallel_worker() only looks at
>>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>>> parallel_workers = Min(parallel_workers, max_parallel_workers);
>>> Perhaps this was discussed and is actually intentional, though.
>
>> It was intentional.  See the last paragraph of
>> https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com
>
> Since this has now come up twice, I suggest adding a comment there
> that explains why we're intentionally ignoring max_parallel_workers.

Hey, imagine if the comments explained the logic behind the code!

Good idea.  How about the attached?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


add-worker-comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >