date:20160916

Re: [HACKERS] Rename max_parallel_degree?

2016-09-16 Thread Amit Kapila

On Fri, Sep 16, 2016 at 11:54 PM, Robert Haas  wrote:
>
> +   /*
> +* We need a write barrier to make sure the update of
> +* parallel_terminate_count is done before the store to in_use
> +*/
>
> Does the order actually matter here?
>

I think so.  If slot->in_use is reordered before the check of
is_parallel_worker, then it is possible that concurrent registration
of worker can mark the is_parallel_worker as false before we check the
flag here.  See explanation in previous e-mail [1].

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BQ_DdJ28qXYSHZiBKNf2MY1QFrv5XAOAw4ZXHw4TPMxA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-16 Thread Dilip Kumar

On Sat, Sep 17, 2016 at 6:54 AM, Tomas Vondra
 wrote:
> Although most of the changes probably does not matter much for unlogged
> tables (I planned to see how this affects regular tables, but as I see no
> difference for unlogged ones, I haven't done that yet).
>
> So the question is why Dilip sees +30% improvement, while my results are
> almost exactly the same. Looking at Dilip's benchmark, I see he only ran the
> test for 10 seconds, and I'm not sure how many runs he did, warmup etc.
> Dilip, can you provide additional info?

Actually I ran test for 10 minutes.

Sorry for the confusion (I copy paste my script and manually replaced
the variable and made mistake )

My script is like this

scale_factor=300
shared_bufs=8GB
time_for_reading=600

./postgres -c shared_buffers=8GB -c checkpoint_timeout=40min -c
max_wal_size=20GB -c max_connections=300 -c maintenance_work_mem=1GB&
./pgbench -i -s $scale_factor --unlogged-tables postgres
./pgbench -c $threads -j $threads -T $time_for_reading -M prepared
postgres -f ../../script.sql >> test_results.txt

I am taking median of three readings..

with below script, I can repeat my results every time (64 client 15%
gain on head and 128+ client 30% gain on head).

I will repeat my test with 8,16 and 32 client and post the results soon.

> \set aid random (1,3000)
> \set tid random (1,3000)
>
> BEGIN;
> SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
> SAVEPOINT s1;
> SELECT tbalance FROM pgbench_tellers WHERE tid = :tid for UPDATE;
> SAVEPOINT s2;
> SELECT abalance FROM pgbench_accounts WHERE aid = :aid for UPDATE;
> END;
> ---


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-16 Thread Amit Kapila

On Sat, Sep 17, 2016 at 9:17 AM, Tomas Vondra
 wrote:
> On 09/14/2016 05:29 PM, Robert Haas wrote:
>>
>> On Wed, Sep 14, 2016 at 12:55 AM, Dilip Kumar 
>> wrote:
>>>
>>> 2. Results
>>> ./pgbench -c $threads -j $threads -T 10 -M prepared postgres -f
>>> script.sql
>>> scale factor: 300
>>> Clients   head(tps)grouplock(tps)  granular(tps)
>>> ---  -   --   ---
>>> 12829367 3932637421
>>> 18029777 3781036469
>>> 25628523 3741835882
>>>
>>>
>>> grouplock --> 1) Group mode to reduce CLOGControlLock contention
>>> granular  --> 2) Use granular locking model
>>>
>>> I will test with 3rd approach also, whenever I get time.
>>>
>>> 3. Summary:
>>> 1. I can see on head we are gaining almost ~30 % performance at higher
>>> client count (128 and beyond).
>>> 2. group lock is ~5% better compared to granular lock.
>>
>>
>> Sure, but you're testing at *really* high client counts here.  Almost
>> nobody is going to benefit from a 5% improvement at 256 clients.  You
>> need to test 64 clients and 32 clients and 16 clients and 8 clients
>> and see what happens there.  Those cases are a lot more likely than
>> these stratospheric client counts.
>>
>
> Right. My impression from the discussion so far is that the patches only
> improve performance with very many concurrent clients - but as Robert points
> out, almost no one is running with 256 active clients, unless they have 128
> cores or so. At least not if they value latency more than throughput.
>

See, I am also not in favor of going with any of these patches, if
they doesn't help in reduction of contention.  However, I think it is
important to understand, under what kind of workload and which
environment it can show the benefit or regression whichever is
applicable.  Just FYI, couple of days back one of EDB's partner who
was doing the performance tests by using HammerDB (which is again OLTP
focussed workload) on 9.5 based code has found that CLogControlLock
has the significantly high contention.  They were using
synchronous_commit=off in their settings.  Now, it is quite possible
that with improvements done in 9.6, the contention they are seeing
will be eliminated, but we have yet to figure that out.  I just shared
this information to you with the intention that this seems to be a
real problem and we should try to work on it unless we are able to
convince ourselves that this is not a problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-16 Thread Amit Kapila

On Sat, Sep 17, 2016 at 9:12 AM, Tomas Vondra
 wrote:
> On 09/17/2016 05:23 AM, Amit Kapila wrote:
>>
>> The difference between these and tests performed by Dilip is that he
>> has lesser savepoints.  I think if you want to try it again, then can
>> you once do it with either no savepoint or 1~2 savepoints.  The other
>> thing you could try out is the same test as Dilip has done (with and
>> without 2 savepoints).
>>
>
> I don't follow. My understanding is the patches should make savepoints
> cheaper - so why would using fewer savepoints increase the effect of the
> patches?
>

Oh, no the purpose of the patch is not to make savepoints cheaper (I
know I have earlier suggested to check by having few savepoints, but
that was not intended to mean that this patch makes savepoint cheaper,
rather it might show the difference between different approaches,
sorry if that was not clearly stated earlier).  The purpose of this
patch('es) is to make commits cheaper and in particular updating the
status in CLOG.  Let me try to explain in brief about the CLOG
contention and what these patches try to accomplish.  As of head, when
we try to update the status in CLOG (TransactionIdSetPageStatus), we
take CLOGControlLock in EXCLUSIVE mode for reading the appropriate
CLOG page (most of the time, it will be in memory, so it is cheap) and
then updating the transaction status in the same.  We take
CLOGControlLock in SHARED mode (if we the required clog page is in
memory, otherwise the lock is upgraded to Exclusive) while reading the
transaction status which happen when we access the tuple where hint
bit is not set.

So, we have two different type of contention around CLOGControlLock,
(a) all the transactions that try to commit at same time, each of them
have to do it almost serially (b) readers of transaction status
contend with writers.

Now with the patch that went in 9.6 (increasing the clog buffers), the
second type of contention is mostly reduced as most of the required
pages are in-memory and we are hoping that this patch will help in
reducing first type (a) of contention as well.

>>
>
> I'm OK with running Dilip's tests, but I'm not sure why there's not much
> point in running the tests I've done. Or perhaps I'd like to understand why
> "my tests" show no improvement whatsoever first - after all, they're not
> that different from Dilip's.
>

The test which Dilip is doing "Select ... For Update" is mainly aiming
at first type (a) of contention as it doesn't change the hint bits, so
mostly it should not go for reading the transaction status when
accessing the tuple.  Whereas, the tests you are doing is mainly
focussed on second type (b) of contention.

I think one point we have to keep in mind here is that this contention
is visible in bigger socket m/c, last time Jesper also tried these
patches, but didn't find much difference in his environment and on
further analyzing (IIRC) we found that the reason was that contention
was not visible in his environment.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-16 Thread Tomas Vondra


On 09/14/2016 05:29 PM, Robert Haas wrote:

On Wed, Sep 14, 2016 at 12:55 AM, Dilip Kumar  wrote:

2. Results
./pgbench -c $threads -j $threads -T 10 -M prepared postgres -f script.sql
scale factor: 300
Clients   head(tps)grouplock(tps)  granular(tps)
---  -   --   ---
12829367 3932637421
18029777 3781036469
25628523 3741835882


grouplock --> 1) Group mode to reduce CLOGControlLock contention
granular  --> 2) Use granular locking model

I will test with 3rd approach also, whenever I get time.

3. Summary:
1. I can see on head we are gaining almost ~30 % performance at higher
client count (128 and beyond).
2. group lock is ~5% better compared to granular lock.


Sure, but you're testing at *really* high client counts here.  Almost
nobody is going to benefit from a 5% improvement at 256 clients.  You
need to test 64 clients and 32 clients and 16 clients and 8 clients
and see what happens there.  Those cases are a lot more likely than
these stratospheric client counts.



Right. My impression from the discussion so far is that the patches only 
improve performance with very many concurrent clients - but as Robert 
points out, almost no one is running with 256 active clients, unless 
they have 128 cores or so. At least not if they value latency more than 
throughput.


So while it's nice to improve throughput in those cases, it's a bit like 
a tree falling in the forest without anyone around.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-16 Thread Tomas Vondra


On 09/17/2016 05:23 AM, Amit Kapila wrote:

On Sat, Sep 17, 2016 at 6:54 AM, Tomas Vondra
 wrote:

On 09/14/2016 06:04 PM, Dilip Kumar wrote:



...


(I've also ran it with 100M rows, called "large" in the results), and
pgbench is running this transaction:

\set id random(1, 10)

BEGIN;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s1;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s2;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s3;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s4;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s5;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s6;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s7;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s8;
COMMIT;

So 8 simple UPDATEs interleaved by savepoints.



The difference between these and tests performed by Dilip is that he
has lesser savepoints.  I think if you want to try it again, then can
you once do it with either no savepoint or 1~2 savepoints.  The other
thing you could try out is the same test as Dilip has done (with and
without 2 savepoints).



I don't follow. My understanding is the patches should make savepoints 
cheaper - so why would using fewer savepoints increase the effect of the 
patches?


FWIW I've already done a quick test with 2 savepoints, no difference. I 
can do a full test of course.



The benchmark was running on
a machine with 256GB of RAM, 32 cores (4x E5-4620) and a fairly large SSD
array. I'd done some basic tuning on the system, most importantly:

effective_io_concurrency = 32
work_mem = 512MB
maintenance_work_mem = 512MB
max_connections = 300
checkpoint_completion_target = 0.9
checkpoint_timeout = 3600
max_wal_size = 128GB
min_wal_size = 16GB
shared_buffers = 16GB

Although most of the changes probably does not matter much for unlogged
tables (I planned to see how this affects regular tables, but as I see no
difference for unlogged ones, I haven't done that yet).



You are right.  Unless, we don't see the benefit with unlogged tables,
there is no point in doing it for regular tables.


So the question is why Dilip sees +30% improvement, while my results are
almost exactly the same. Looking at Dilip's benchmark, I see he only ran the
test for 10 seconds, and I'm not sure how many runs he did, warmup etc.
Dilip, can you provide additional info?

I'll ask someone else to redo the benchmark after the weekend to make sure
it's not actually some stupid mistake of mine.



I think there is not much point in repeating the tests you have
done, rather it is better if we can try again the tests done by Dilip
in your environment to see the results.



I'm OK with running Dilip's tests, but I'm not sure why there's not much 
point in running the tests I've done. Or perhaps I'd like to understand 
why "my tests" show no improvement whatsoever first - after all, they're 
not that different from Dilip's.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-16 Thread Amit Kapila

On Sat, Sep 17, 2016 at 6:54 AM, Tomas Vondra
 wrote:
> On 09/14/2016 06:04 PM, Dilip Kumar wrote:
>>
>> On Wed, Sep 14, 2016 at 8:59 PM, Robert Haas 
>> wrote:
>>>
>>> Sure, but you're testing at *really* high client counts here.  Almost
>>> nobody is going to benefit from a 5% improvement at 256 clients.
>>
>>
>> I agree with your point, but here we need to consider one more thing,
>> that on head we are gaining ~30% with both the approaches.
>>
>> So for comparing these two patches we can consider..
>>
>> A.  Other workloads (one can be as below)
>>-> Load on CLogControlLock at commit (exclusive mode) + Load on
>> CLogControlLock at Transaction status (shared mode).
>>I think we can mix (savepoint + updates)
>>
>> B. Simplicity of the patch (if both are performing almost equal in all
>> practical scenarios).
>>
>> C. Bases on algorithm whichever seems winner.
>>
>> I will try to test these patches with other workloads...
>>
>>>  You
>>> need to test 64 clients and 32 clients and 16 clients and 8 clients
>>> and see what happens there.  Those cases are a lot more likely than
>>> these stratospheric client counts.
>>
>>
>> I tested with 64 clients as well..
>> 1. On head we are gaining ~15% with both the patches.
>> 2. But group lock vs granular lock is almost same.
>>
>
>
> The transaction is using a single unlogged table initialized like this:
>
> create unlogged table t(id int, val int);
> insert into t select i, i from generate_series(1,10) s(i);
> vacuum t;
> create index on t(id);
>
> (I've also ran it with 100M rows, called "large" in the results), and
> pgbench is running this transaction:
>
> \set id random(1, 10)
>
> BEGIN;
> UPDATE t SET val = val + 1 WHERE id = :id;
> SAVEPOINT s1;
> UPDATE t SET val = val + 1 WHERE id = :id;
> SAVEPOINT s2;
> UPDATE t SET val = val + 1 WHERE id = :id;
> SAVEPOINT s3;
> UPDATE t SET val = val + 1 WHERE id = :id;
> SAVEPOINT s4;
> UPDATE t SET val = val + 1 WHERE id = :id;
> SAVEPOINT s5;
> UPDATE t SET val = val + 1 WHERE id = :id;
> SAVEPOINT s6;
> UPDATE t SET val = val + 1 WHERE id = :id;
> SAVEPOINT s7;
> UPDATE t SET val = val + 1 WHERE id = :id;
> SAVEPOINT s8;
> COMMIT;
>
> So 8 simple UPDATEs interleaved by savepoints.
>

The difference between these and tests performed by Dilip is that he
has lesser savepoints.  I think if you want to try it again, then can
you once do it with either no savepoint or 1~2 savepoints.  The other
thing you could try out is the same test as Dilip has done (with and
without 2 savepoints).

> The benchmark was running on
> a machine with 256GB of RAM, 32 cores (4x E5-4620) and a fairly large SSD
> array. I'd done some basic tuning on the system, most importantly:
>
> effective_io_concurrency = 32
> work_mem = 512MB
> maintenance_work_mem = 512MB
> max_connections = 300
> checkpoint_completion_target = 0.9
> checkpoint_timeout = 3600
> max_wal_size = 128GB
> min_wal_size = 16GB
> shared_buffers = 16GB
>
> Although most of the changes probably does not matter much for unlogged
> tables (I planned to see how this affects regular tables, but as I see no
> difference for unlogged ones, I haven't done that yet).
>

You are right.  Unless, we don't see the benefit with unlogged tables,
there is no point in doing it for regular tables.

> So the question is why Dilip sees +30% improvement, while my results are
> almost exactly the same. Looking at Dilip's benchmark, I see he only ran the
> test for 10 seconds, and I'm not sure how many runs he did, warmup etc.
> Dilip, can you provide additional info?
>
> I'll ask someone else to redo the benchmark after the weekend to make sure
> it's not actually some stupid mistake of mine.
>

I think there is not much point in repeating the tests you have done,
rather it is better if we can try again the tests done by Dilip in
your environment to see the results.

Thanks for doing the tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2016-09-16 Thread Tomas Vondra


On 09/14/2016 06:04 PM, Dilip Kumar wrote:

On Wed, Sep 14, 2016 at 8:59 PM, Robert Haas  wrote:

Sure, but you're testing at *really* high client counts here.  Almost
nobody is going to benefit from a 5% improvement at 256 clients.


I agree with your point, but here we need to consider one more thing,
that on head we are gaining ~30% with both the approaches.

So for comparing these two patches we can consider..

A.  Other workloads (one can be as below)
   -> Load on CLogControlLock at commit (exclusive mode) + Load on
CLogControlLock at Transaction status (shared mode).
   I think we can mix (savepoint + updates)

B. Simplicity of the patch (if both are performing almost equal in all
practical scenarios).

C. Bases on algorithm whichever seems winner.

I will try to test these patches with other workloads...


 You
need to test 64 clients and 32 clients and 16 clients and 8 clients
and see what happens there.  Those cases are a lot more likely than
these stratospheric client counts.


I tested with 64 clients as well..
1. On head we are gaining ~15% with both the patches.
2. But group lock vs granular lock is almost same.



I've been doing some testing too, but I haven't managed to measure any 
significant difference between master and any of the patches. Not sure 
why, I've repeated the test from scratch to make sure I haven't done 
anything stupid, but I got the same results (which is one of the main 
reasons why the testing took me so long).


Attached is an archive with a script running the benchmark (including 
SQL scripts generating the data and custom transaction for pgbench), and 
results in a CSV format.


The benchmark is fairly simple - for each case (master + 3 different 
patches) we do 10 runs, 5 minutes each, for 32, 64, 128 and 192 clients 
(the machine has 32 physical cores).


The transaction is using a single unlogged table initialized like this:

create unlogged table t(id int, val int);
insert into t select i, i from generate_series(1,10) s(i);
vacuum t;
create index on t(id);

(I've also ran it with 100M rows, called "large" in the results), and 
pgbench is running this transaction:


\set id random(1, 10)

BEGIN;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s1;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s2;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s3;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s4;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s5;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s6;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s7;
UPDATE t SET val = val + 1 WHERE id = :id;
SAVEPOINT s8;
COMMIT;

So 8 simple UPDATEs interleaved by savepoints. The benchmark was running 
on a machine with 256GB of RAM, 32 cores (4x E5-4620) and a fairly large 
SSD array. I'd done some basic tuning on the system, most importantly:


effective_io_concurrency = 32
work_mem = 512MB
maintenance_work_mem = 512MB
max_connections = 300
checkpoint_completion_target = 0.9
checkpoint_timeout = 3600
max_wal_size = 128GB
min_wal_size = 16GB
shared_buffers = 16GB

Although most of the changes probably does not matter much for unlogged 
tables (I planned to see how this affects regular tables, but as I see 
no difference for unlogged ones, I haven't done that yet).


So the question is why Dilip sees +30% improvement, while my results are 
almost exactly the same. Looking at Dilip's benchmark, I see he only ran 
the test for 10 seconds, and I'm not sure how many runs he did, warmup 
etc. Dilip, can you provide additional info?


I'll ask someone else to redo the benchmark after the weekend to make 
sure it's not actually some stupid mistake of mine.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


clog.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: About CMake v2

2016-09-16 Thread Michael Paquier

On Sat, Sep 17, 2016 at 8:11 AM, Andres Freund  wrote:
> Hi,
>
> On 2016-09-16 22:33:50 +0900, Michael Paquier wrote:
>> As supporting all platforms at once with cmake is going to be
>> uncommitable, we are going to need both methods able to live together
>> for a while.
>
> I very strongly disagree with this.  It's way too likely that we end up
> releasing with multiple buildsystems that way.  I think we'll have to
> require support for the most common OSs, and then fix up the fallout
> afterwards.

Ok, then if we go through the hard method the switch patch had better
remove as well all the Makefiles in the tree and recommend cmake run
from base root as the default build method. By the way, there will be
a strong need for docs in the patch sooner or later...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: About CMake v2

2016-09-16 Thread Andres Freund

Hi,

On 2016-09-16 22:33:50 +0900, Michael Paquier wrote:
> As supporting all platforms at once with cmake is going to be
> uncommitable, we are going to need both methods able to live together
> for a while.

I very strongly disagree with this.  It's way too likely that we end up
releasing with multiple buildsystems that way.  I think we'll have to
require support for the most common OSs, and then fix up the fallout
afterwards.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: About CMake v2

2016-09-16 Thread Michael Paquier

On Sat, Sep 17, 2016 at 1:40 AM, Yury Zhuravlev
 wrote:
> Michael Paquier wrote:
> I merged master to my branch and I spent time to porting all changes. I hope
> send patch in the weekend without terrible flaws.

By the way, I noticed that you did not register this patch in the current CF..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recovery_min_apply-delay and remote_apply

2016-09-16 Thread Thomas Munro

On Sat, Sep 17, 2016 at 8:45 AM, Bernd Helmle  wrote:
> Current PostgreSQL Documentation on recovery.conf has this about
> recovery_min_apply_delay[1]:
>
> ---<---
>
> This parameter is intended for use with streaming replication deployments;
> however, if the parameter is specified it will be honored in all cases.
> Synchronous replication is not affected by this setting because there is
> not yet any setting to request synchronous apply of transaction commits.
>
> --->---
>
> If i understand correctly, this is not true anymore with 9.6, where
> remote_apply will have exactly the behavior the paragraph above wants to
> contradict: any transaction executed with synchronous_commit=remote_apply
> will wait at least recovery_min_apply_delay to finish. Given that
> synchronous_commit can be controlled by any user, this might be dangerous
> if someone doesn't take care enough.

Yes, I missed that sentence.  Thanks.

> I think we need a doc patch for that at least, see attached patch against
> master, but 9.6 should have a corrected one, too.

+1

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reminder: Call for Papers -- PGDay Austin 2016

2016-09-16 Thread Andres Freund

Hi,

On 2016-09-16 16:22:39 -0500, Jim Nasby wrote:
> Just a reminder / heads-up for those who don't follow -announce.

Uh, I'm against making hackers a duplicate of announce. If people don't
follow announce that's their choice.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Reminder: Call for Papers -- PGDay Austin 2016

2016-09-16 Thread Jim Nasby

Just a reminder / heads-up for those who don't follow -announce.

 Forwarded Message 
Subject:[ANNOUNCE] Call for Papers -- PGDay Austin 2016
Date:   Wed, 7 Sep 2016 19:54:22 +
From:   Debra Cerda 
To: pgsql-annou...@postgresql.org 

***PGDay Austin 2016 now accepting proposals for talks! *

Join PostgreSQL users and advocates in Austin for a day of PostgreSQL at 
PGDay Austin 2016, on Saturday, November 12, 2016, at the Austin 
Galvanize campus 
 (118 
Nueces St. Austin, TX 78701).

Our theme is "Everything's Bigger in Postgres", with a focus on 
PostgreSQL's role in the community of "Big Data" and Data Science -- 
deep computing, analytics, and scalability. We encourage content *for 
all knowledge/experience level*s, and welcome content that speaks to 
users new to PostgreSQL as well as practical applications and case 
studies. *W**e welcome any PostgreSQL contributors and users, as well as 
other external open source software communities* including -- but not 
limited -- to Python, Django, Rails, JSON, and more to submit a proposal.

Each session will last 45 minutes, with an extra 10 minutes allotted for 
for Q&As. Depending on response, we may offer dual tracks in the 
afternoon as well as lightning talks. We are targeting up to 150 
attendees, both local and across Texas and the U.S.

*Submission deadline is Wednesday, September 21, 2016* by midnight CST, 
with notifications to be sent by Monday, September 26. Submit your 
proposal via this form .

*Registration** will open soon --* it'll be a wonderful opportunity to 
learn and network, and we'll have some great local food and refreshments 
provided for attendees, along with a few other surprises!

*Sponsorship opportunitie*s are also available and we encourage your 
support! Help us keep costs down for attendees as we work to onboard 
more PostgreSQL supporters, users, and developers, as well as other open 
source software communities.

*Our Silent Auction *will benefit ChickTech Austin, whose high school 
program kick-off weekend will be taking place a few blocks away from our 
event. We welcome donated items and services for this fundraiser.

Contact debra.ce...@bluetreble.com  
or @debra_cerda via the Postgres Slack channel for more information, 
including request for Sponsorship Prospectus.

Thank you to Galvanize Austin for their support as our venue sponsor.

Hosted by the Austin PostgreSQL User Group 
, an associate of the U.S. PostgreSQL 
Association.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] recovery_min_apply-delay and remote_apply

2016-09-16 Thread Bernd Helmle

Current PostgreSQL Documentation on recovery.conf has this about
recovery_min_apply_delay[1]:

---<---

This parameter is intended for use with streaming replication deployments;
however, if the parameter is specified it will be honored in all cases.
Synchronous replication is not affected by this setting because there is
not yet any setting to request synchronous apply of transaction commits.

--->---

If i understand correctly, this is not true anymore with 9.6, where
remote_apply will have exactly the behavior the paragraph above wants to
contradict: any transaction executed with synchronous_commit=remote_apply
will wait at least recovery_min_apply_delay to finish. Given that
synchronous_commit can be controlled by any user, this might be dangerous
if someone doesn't take care enough.

I think we need a doc patch for that at least, see attached patch against
master, but 9.6 should have a corrected one, too.

[1] 


-- 
Thanks

Bernd

recovery-config-doc.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] more parallel query documentation

2016-09-16 Thread Alvaro Herrera

Robert Haas wrote:

> Hey, everybody: I intended to add this to the documentation before 9.6
> went out, but that didn't get done.  Maybe it'll have to happen later
> at this point, but can I get some advice on WHERE in the documentation
> this stuff could be added?  Assuming people agree it should be added?
> The major subsections of the documentation are "Tutorial", "The SQL
> Language", "Server Administration", "Client Interfaces", "Server
> Programming", "Reference",  "Internals", and "Appendixes", and it's
> not clear to me that parallel query fits very well into any of those
> categories.

I agree it should be added.  I suggest that it could even be added to
the 9.6 docs, if you can make it.

I think the sections "Tutorial" and "The SQL Language" are the most
reasonable places.  The latter seems to be exclusively about how to word
the queries rather than how they are executed, though adding a new
section before or after "Performance Tips" seems not completely
off-topic.

The "Tutorial" seems somewhat more than a tutorial these days, but it
seems much more lighter reading than what you have in that wiki page
anyway.  Perhaps it would be okay to add some simple text in the
"Advanced Features" section, and elaborate in the "The SQL Language"
chapter.

(Aside: it seems strange to have a "The SQL Language" section inside the
"Tutorial" chapter and a separate "The SQL Language" chapter.)

I gave a quick look to https://wiki.postgresql.org/wiki/Parallel_Query I
think it reads a little strange still: it doesn't say that parallel
query is implemented on top of bgworkers, yet very early it suggests
that the max_parallel_degree value depends on the max_worker_processes
parameter without explaining why.  I think that could be clearer.
Also, the blurb about VACUUM/CLUSTER looks like it belongs in the "When
can parallel query be used" section rather than the intro.

> I feel like we need a new major division for operational issues that
> don't qualify as server administration - e.g. query performance
> tuning, parallel query, how to decide what indexes to create...

I'm not opposed to this idea.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-09-16 Thread Julien Rouhaud

On 16/09/2016 20:24, Robert Haas wrote:
> On Wed, Jun 29, 2016 at 10:46 PM, Amit Kapila  wrote:
>> Your patch looks good to me and is ready for a committer's look.
>>
>> Notes for committer -
>> a. Verify if description of newly added Guc max_parallel_workers looks
>> okay to you, me and Julien are not in 100% agreement on that.
>> b. Comments might need some improvement.
> 
> This patch needs to be rebased.  I hope somebody can volunteer to do
> that, because I'd like to commit it once we've hashed out the details.
> 

I just rebased the previous patch on current HEAD, with some other
modifications, see below (attached v8 if that helps).

> Would it bother anybody very much if we bumped up these values, say by
> increasing max_worker_processes from 8 to 16 and max_parallel_workers
> from 4 (as it is in the current patch version) to 8?  I feel like 4 is
> a bit more conservative than I'd like to be by default, and I'd like
> to make sure that we leave room for other sorts of background workers
> between the two limits.
> 

That's fine by me.  Should this be done (if there's no objection) in the
same patch, or in another one?


> I'd suggest renaming the "parallel" flag to BackgroundWorkerSlot to
> "is_parallel_worker".  Or, actually, what I think would be better is
> to give it a name like worker_class, and then we can have
> BGWORKER_CLASS_PARALLEL and perhaps eventually
> BGWORKER_CLASS_REPLICATION, etc.
> 

For now I just renamed "parallel" to "is_parallel_worker", since this is
straightforward.  For a new "worker_class", I guess we'd need a new enum
stored in BackgroundWorker struct instead of the
BGWORKER_IS_PARALLEL_WORKER flag, and store it in the
BackgroundWorkerSlot. Should I do that instead?


> + * terminated ones.  These counters can of course overlaps, but it's not
> + * important here since the substraction will still give the right number.
> 
> overlaps -> overflow.  substraction -> subtraction.
> 

oops sorry, fixed

> +   /*
> +* We need a write barrier to make sure the update of
> +* parallel_terminate_count is done before the store to in_use
> +*/
> 
> Does the order actually matter here?
> 

After some more thinking, it looks like a reorder here won't have any
impact. I'll remove it, unless Amit has an objection about it.

> +   {"max_parallel_workers", PGC_USERSET, RESOURCES_ASYNCHRONOUS,
> +   gettext_noop("Sets the maximum number of
> parallel processes for the cluster."),
> 
> I suggest: sets the maximum number of parallel workers that can be
> active at one time.
> 

changed

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index cd66abc..3abd2e5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2005,8 +2005,9 @@ include_dir 'conf.d'
  Sets the maximum number of workers that can be started by a single
  Gather node.  Parallel workers are taken from the
  pool of processes established by
- .  Note that the requested
- number of workers may not actually be available at run time.  If this
+ , limited by
+ .  Note that the requested
+ number of workers may not actually be available at runtime.  If this
  occurs, the plan will run with fewer workers than expected, which may
  be inefficient.  The default value is 2.  Setting this value to 0
  disables parallel query execution.
@@ -2030,6 +2031,21 @@ include_dir 'conf.d'

   
 
+  
+   max_parallel_workers (integer)
+   
+max_parallel_workers configuration 
parameter
+   
+   
+   
+
+ Sets the maximum number of workers that the system can support for
+ parallel queries.  The default value is 4.  Setting this value to 0
+ disables parallel query execution.
+
+   
+  
+
   
backend_flush_after (integer)

diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index cde0ed3..5f6e429 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -453,7 +453,8 @@ LaunchParallelWorkers(ParallelContext *pcxt)
snprintf(worker.bgw_name, BGW_MAXLEN, "parallel worker for PID %d",
 MyProcPid);
worker.bgw_flags =
-   BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
+   BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION
+   | BGWORKER_IS_PARALLEL_WORKER;
worker.bgw_start_time = BgWorkerStart_ConsistentState;
worker.bgw_restart_time = BGW_NEVER_RESTART;
worker.bgw_main = ParallelWorkerMain;
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 99b6bc8..8f33813 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -718,9 +718,

Re: [HACKERS] Hash Indexes

2016-09-16 Thread Jesper Pedersen


On 09/16/2016 03:18 AM, Amit Kapila wrote:

Attached is a run with 1000 rows.



I think 1000 is also less, you probably want to run it for 100,000 or
more rows.  I suspect that the reason why you are seeing the large
difference between btree and hash index is that the range of values is
narrow and there may be many overflow pages.



Attached is 100,000.


I think for CHI is would be Robert's and others feedback. For WAL, there is
[1].



I have fixed your feedback for WAL and posted the patch.


Thanks !


I think the
remaining thing to handle for Concurrent Hash Index patch is to remove
the usage of hashscan.c from code if no one objects to it, do let me
know if I am missing something here.



Like Robert said, hashscan.c can always come back, and it would take a 
call-stack out of the 'am' methods.


Best regards,
 Jesper


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

2016-09-16 Thread Andres Freund

On 2016-09-16 09:12:22 -0700, Jeff Janes wrote:
> On Thu, Sep 15, 2016 at 7:23 AM, Andres Freund  wrote:
> > One earlier question about this is whether that is actually a worthwhile
> > goal.  Are the speed and space benefits big enough in the general case?
> > Could those benefits not be achieved in a more maintainable manner by
> > adding a layer that uses a btree over hash(columns), and adds
> > appropriate rechecks after heap scans?
> >
> > Note that I'm not saying that hash indexes are not worthwhile, I'm just
> > doubtful that question has been explored sufficiently.

> I think that exploring it well requires good code.  If the code is good,
> why not commit it?

Because getting there requires a lot of effort, debugging it afterwards
would take effort, and maintaining it would also takes a fair amount?
Adding code isn't free.

I'm rather unenthused about having a hash index implementation that's
mildly better in some corner cases, but otherwise doesn't have much
benefit. That'll mean we'll have to step up our user education a lot,
and we'll have to maintain something for little benefit.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] more parallel query documentation

2016-09-16 Thread Robert Haas

On Thu, Apr 21, 2016 at 9:16 PM, Amit Langote
 wrote:
> On 2016/04/15 12:02, Robert Haas wrote:
>> As previously threatened, I have written some user documentation for
>> parallel query.  I put it up here:
>>
>> https://wiki.postgresql.org/wiki/Parallel_Query
>>
>> This is not totally comprehensive and I can think of a few more
>> details that could be added, but it's a pretty good high-level
>> overview of what got into 9.6.  After this has gotten some feedback
>> and polishing, I would like to add a version of this to the SGML
>> documentation somewhere.  I am not sure where it would fit, but I
>> think it's good to document stuff like this.
>
> Looking at the "Function Labeling For Parallel Safety" section.  There is
> a sentence:
>
> "Functions must be marked PARALLEL UNSAFE if they write to the database,
> access sequences, change the transaction state even temporarily (e.g. a
> PL/pgsql function which establishes an EXCEPTION block to catch errors),
> or make persistent changes to settings."
>
> Then looking at the "postgres_fdw vs. force_parallel_mode on ppc" thread
> [1], I wonder if a note on the lines of "or a function that creates *new*
> connection(s) to remote server(s)" may be in order.  Overkill?

That's not necessarily parallel-unsafe.  It's probably
parallel-restricted at most.

Hey, everybody: I intended to add this to the documentation before 9.6
went out, but that didn't get done.  Maybe it'll have to happen later
at this point, but can I get some advice on WHERE in the documentation
this stuff could be added?  Assuming people agree it should be added?
The major subsections of the documentation are "Tutorial", "The SQL
Language", "Server Administration", "Client Interfaces", "Server
Programming", "Reference",  "Internals", and "Appendixes", and it's
not clear to me that parallel query fits very well into any of those
categories.  I suppose "Internals" is closest, but that's mostly stuff
that typical users won't care about, whereas what I'm trying to
document here is what parallel query will look like from a user
perspective, not how it looks under the hood.  I feel like we need a
new major division for operational issues that don't qualify as server
administration - e.g. query performance tuning, parallel query, how to
decide what indexes to create...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rename max_parallel_degree?

2016-09-16 Thread Robert Haas

On Wed, Jun 29, 2016 at 10:46 PM, Amit Kapila  wrote:
> Your patch looks good to me and is ready for a committer's look.
>
> Notes for committer -
> a. Verify if description of newly added Guc max_parallel_workers looks
> okay to you, me and Julien are not in 100% agreement on that.
> b. Comments might need some improvement.

This patch needs to be rebased.  I hope somebody can volunteer to do
that, because I'd like to commit it once we've hashed out the details.

Would it bother anybody very much if we bumped up these values, say by
increasing max_worker_processes from 8 to 16 and max_parallel_workers
from 4 (as it is in the current patch version) to 8?  I feel like 4 is
a bit more conservative than I'd like to be by default, and I'd like
to make sure that we leave room for other sorts of background workers
between the two limits.

I'd suggest renaming the "parallel" flag to BackgroundWorkerSlot to
"is_parallel_worker".  Or, actually, what I think would be better is
to give it a name like worker_class, and then we can have
BGWORKER_CLASS_PARALLEL and perhaps eventually
BGWORKER_CLASS_REPLICATION, etc.

+ * terminated ones.  These counters can of course overlaps, but it's not
+ * important here since the substraction will still give the right number.

overlaps -> overflow.  substraction -> subtraction.

+   /*
+* We need a write barrier to make sure the update of
+* parallel_terminate_count is done before the store to in_use
+*/

Does the order actually matter here?

+   {"max_parallel_workers", PGC_USERSET, RESOURCES_ASYNCHRONOUS,
+   gettext_noop("Sets the maximum number of
parallel processes for the cluster."),

I suggest: sets the maximum number of parallel workers that can be
active at one time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-16 Thread Andres Freund

Hi,

On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote:
> On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:
> > Yikes, that's a pretty absurd implementation.
>
> Not when you take into account that it's been written over 20 years ago ;)

Well, that doesn't mean it can't be fixed ;)

> > I'm not quite sure I understand why this an issue here - there shouldn't
> > ever be events on this fd, so why is the kernel waking up all processes?
> > It'd kinda makes sense it'd wake up all processes if there's one
> > waiting, but ... ?
>
> Every read is an event, and that's what PostmasterIsAlive does.

But in most places we only do a PostmasterIsAlive if WaitLatch returns
WL_POSTMASTER_DEATH.  The only walreceiver related place that doesn't is
WalRcvWaitForStartPosition(). If that's indeed the cause of your issues 
this quite possibly could be fixed by doing the
if (!PostmasterIsAlive())
exit(1);
check not unconditionally, but only after the WaitLatch at the end of
the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
seems fine, we'll just loop once more outside (after a quick glance at
least).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] design for a partitioning feature (was: inheritance)

2016-09-16 Thread Robert Haas

On Mon, May 23, 2016 at 6:31 PM, Alvaro Herrera
 wrote:
> Tom Lane wrote:
>> My feeling about it is that we need to provide a partitioning feature
>> that doesn't rely on the current notion of inheritance at all.  We've
>> heard from multiple users who want to use large numbers of partitions,
>> enough that simply having a separate relcache entry for each partition
>> would be a performance problem, never mind the current approach to
>> planning queries over inheritance trees.  So the partitions need to be
>> objects much simpler than full-fledged tables.
>
> Sorry to hijack the thread, but I agree on this, and I'm worried that
> the patch being floated for partitioning may paint us on a corner from
> which it may be difficult to get out.

This seems like a serious concern that merits further discussion.
What would a partition that is "much simpler than a full-fledged
table" look like?

>From my point of view, I don't see a real alternative to the direction
that Amit's partitioning patch takes.  But I'd be happy to be
convinced that I've got it all wrong, because I don't think it's going
to come close to fixing all of the problems that we have (though I
doubt that any one patch will do that).  However, let me lay out my
concerns so that, perhaps, somebody can suggest a better line of
thinking.

1. We can't assume that all partitions are the same for purposes of
query planning.  One of the big problems with the status quo at least
in my view is that, as Tom says, we end up with separate paths for
every partition, separate RelOptInfo structures for every partition,
separate relcache and catcache entries for each partition.  Planning
is slow and memory usage is high because all of the work that normally
needs to be done for each table now has to be done for each partition.
Worse, while a query isn't likely to involve more than a dozen or so
tables, and often less, each of those tables could have tens,
hundreds, or even thousands of partitions, so the constant multipliers
are quite high.  It would be far better if we could consider only the
parent for planning purposes, or at least during early stages of
planning, and all of the partitions only later.  However, I see no way
to do that without generating terrible query plans in some cases where
it's clearly possible to do far better, because they need not be the
same size or the same data distribution.  The optimal strategy for
scanning partition A may be different than for partition B.  Of course
it will *often* be true that the right strategy is the same for every
partition, but I think exceptions will be fairly common.

More importantly, there are lots of really important query planner
optimizations that simply can't be done without considering each
partition separately.  For example, if you are joining two tables on
their partitioning key, you can do a partition-wise join rather than
joining the whole thing.  Every one of those joins may pick its own
strategy; e.g. the small partitions may use hash joins while the large
ones are forced to use merge joins due to memory constraints.  Some
partitions may be eliminated by constraints.  Or, to take an
optimization that we already have, getting sorted data from a family
of partitions is much more sensibly done with Merge Append than any
other strategy.  In either case, you can't devise or cost an
intelligent query plan without some understanding of the partition
structure.  At a minimum, you need to know the sizes of the various
partitioning and what sort of indexing is available.  In practice, you
probably want per-partition statistics and information about
constraints, too.  It's probably a bad idea to assume that every
partition has an equivalent data distribution.

2. TIDs are 6 bytes, and we have deeply ingrained assumptions about
what they mean.  So, you can't really have an index of the type that
we support today which slices across multiple partitions unless all of
those partitions share the same set of block numbers.  But what is the
physical layout of the underlying data?  If all of those block numbers
are in fact stored in a single relfilenode, you haven't really
partitioned the data and won't gain the anticipated benefits from the
ability to, say, drop a partition.  If they are stored in different
files, code that iterates over the range of available block numbers
will be performing random I/O rather than sequential I/O.  If it
weren't for those problems, one might imagine allowing cross-partition
indexes that all point into the same TID space, but as it is it seems
that would be require a lot of wrangling to make it work.  Also, it
doubles down on the 32TB table size limit, which is not as irrelevant
as it used to be.  I think if we want cross-partition indexes we
should approach that problem directly by having indexes that contain
8-byte ETIDs instead of 6-byte TIDs, which the extra 2 bytes storing a
partition designator - or something of that sort.

But I can't really see holding up the further dev

Re: [HACKERS] Speedup twophase transactions

2016-09-16 Thread Stas Kelvich

> On 07 Sep 2016, at 11:07, Stas Kelvich  wrote:
> 
>> On 07 Sep 2016, at 03:09, Michael Paquier  wrote:
>> 
 On 06 Sep 2016, at 12:03, Michael Paquier  
 wrote:

 On Tue, Sep 6, 2016 at 5:58 PM, Stas Kelvich  
 wrote:
> Oh, I was preparing new version of patch, after fresh look on it. 
> Probably, I should
> said that in this topic. I’ve found a bug in sub transaction handling and 
> now working
> on fix.

 What's the problem actually?
>>> 
>>> Handling of xids_p array in PrescanPreparedTransactions() is wrong for 
>>> prepared tx's in memory.
>>> Also I want to double-check and add comments to RecoveryInProgress() checks 
>>> in FinishGXact.
>>> 
>>> I’ll post reworked patch in several days.
>> 
>> Could you use as a base the version I just sent yesterday then? I
>> noticed many mistakes in the comments, for example many s/it's/its/
>> and did a couple of adjustments around the code, the goto next_file
>> was particularly ugly. That will be that much work not do to again
>> later.
> 
> Yes. Already merged branch with your version.

Here is updated version of patch.

Looking through old version i’ve noted few things that were bothering me:

* In case of crash replay PREPARE redo accesses SUBTRANS, but StartupSUBTRANS() 
isn’t called yet
in StartupXLOG().
* Several functions in twophase.c have to walk through both PGPROC and 
pg_twophase directory.

Now I slightly changed order of things in StartupXLOG() so twophase state 
loaded from from file and 
StartupSUBTRANS called before actual recovery starts. So in all other functions 
we can be sure that
file were already loaded in memory.

Also since 2pc transactions now are dumped to files only on checkpoint, we can 
get rid of
PrescanPreparedTransactions() — all necessary info can reside in checkpoint 
itself. I’ve changed
behaviour of oldestActiveXid write in checkpoint and that’s probably 
discussable topic, but ISTM
that simplifies a lot recovery logic in both twophase.c and xlog.c. More 
comments on that in patch itself.

twophase_replay.v7.patch
Description: Binary data

-- 
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2016-09-16 Thread Masahiko Sawada

On Fri, Sep 9, 2016 at 6:23 PM, Simon Riggs  wrote:
> On 8 September 2016 at 10:26, Masahiko Sawada  wrote:
>
>> "k (n1, n2, n3)" == "first k (n1, n2, n3)" doesn't break backward
>> compatibility but most users would think "k(n1, n2, n3)" as quorum
>> after introduced quorum.
>> I wish we can change the s_s_names syntax of 9.6 to "first k(n1, n2,
>> n3)" style before 9.6 releasing if we got consensus.
>
> Let's see the proposed patch, so we can evaluate the proposal.
>

Attached 2 patches.
000 patch changes syntax of s_s_names from 'k(n1, n2, n3)' to 'First k
(n1, n2,n3)' for PG9.6.
001 patch adds the quorum commit using syntax 'Any k (n1, n2,n3)' for PG10.

Since we already released 9.6RC1, I understand that it's quite hard to
change syntax of 9.6.
But considering that we support the quorum commit, this could be one
of the solutions in order to avoid breaking backward compatibility and
to provide useful user interface.
So I attached these patches.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
commit bd18dda9be5ab0341eca81de3c48ec6f7466dded
Author: Masahiko Sawada 
Date:   Fri Sep 16 15:32:24 2016 -0700

Change syntax

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index cd66abc..f0f510c 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3037,7 +3037,7 @@ include_dir 'conf.d'
 This parameter specifies a list of standby servers using
 either of the following syntaxes:
 
-num_sync ( standby_name [, ...] )
+FIRST num_sync ( standby_name [, ...] )
 standby_name [, ...]
 
 where num_sync is
@@ -3048,7 +3048,9 @@ include_dir 'conf.d'
 3 (s1, s2, s3, s4) makes transaction commits wait
 until their WAL records are received by three higher-priority standbys
 chosen from standby servers s1, s2,
-s3 and s4.
+s3 and s4. FIRST is
+case-insensitive but the standby having name FIRST
+must be double-quoted.
 
 
 The second syntax was used before PostgreSQL
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 06f49db..84ccb6e 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1150,7 +1150,7 @@ primary_slot_name = 'node_a_slot'
 An example of synchronous_standby_names for multiple
 synchronous standbys is:
 
-synchronous_standby_names = '2 (s1, s2, s3)'
+synchronous_standby_names = 'FIRST 2 (s1, s2, s3)'
 
 In this example, if four standby servers s1, s2,
 s3 and s4 are running, the two standbys
diff --git a/src/backend/replication/Makefile b/src/backend/replication/Makefile
index c99717e..da8bcf0 100644
--- a/src/backend/replication/Makefile
+++ b/src/backend/replication/Makefile
@@ -26,7 +26,7 @@ repl_gram.o: repl_scanner.c
 
 # syncrep_scanner is complied as part of syncrep_gram
 syncrep_gram.o: syncrep_scanner.c
-syncrep_scanner.c: FLEXFLAGS = -CF -p
+syncrep_scanner.c: FLEXFLAGS = -CF -p -i
 syncrep_scanner.c: FLEX_NO_BACKUP=yes
 
 # repl_gram.c, repl_scanner.c, syncrep_gram.c and syncrep_scanner.c
diff --git a/src/backend/replication/syncrep_gram.y b/src/backend/replication/syncrep_gram.y
index 35c2776..b6d2f6c 100644
--- a/src/backend/replication/syncrep_gram.y
+++ b/src/backend/replication/syncrep_gram.y
@@ -46,7 +46,7 @@ static SyncRepConfigData *create_syncrep_config(const char *num_sync,
 	SyncRepConfigData *config;
 }
 
-%token  NAME NUM JUNK
+%token  NAME NUM JUNK FIRST
 
 %type  result standby_config
 %type  standby_list
@@ -61,7 +61,7 @@ result:
 
 standby_config:
 		standby_list{ $$ = create_syncrep_config("1", $1); }
-		| NUM '(' standby_list ')'	{ $$ = create_syncrep_config($1, $3); }
+		| FIRST NUM '(' standby_list ')'	{ $$ = create_syncrep_config($1, $4); }
 	;
 
 standby_list:
@@ -70,7 +70,7 @@ standby_list:
 	;
 
 standby_name:
-		NAME		{ $$ = $1; }
+		NAME		{ $$ = $1;}
 		| NUM		{ $$ = $1; }
 	;
 %%
diff --git a/src/backend/replication/syncrep_scanner.l b/src/backend/replication/syncrep_scanner.l
index d20662e..9dbdfbc 100644
--- a/src/backend/replication/syncrep_scanner.l
+++ b/src/backend/replication/syncrep_scanner.l
@@ -64,6 +64,11 @@ xdinside		[^"]+
 %%
 {space}+	{ /* ignore */ }
 
+first		{
+yylval.str = pstrdup(yytext);
+return FIRST;
+		}
+
 {xdstart}	{
 initStringInfo(&xdbuf);
 BEGIN(xd);
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f0f510c..0ad06ad 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3021,44 +3021,68 @@ include_dir 'conf.d'
 There will be one or more active synchronous standbys;
 transactions waiting for commit will be allowed to proceed after
 these standby servers confirm receipt of their data.
-The synchronous standbys will be those whose names appear
-earlier in this list, and
-that are both currently connected and streaming data in real-time
-

Re: [HACKERS] WIP: About CMake v2

2016-09-16 Thread Yury Zhuravlev


Michael Paquier wrote:

Your patch recommends to build with cmake after creating build/, now I
would expect most users to run cmake from the root folder. However
this causes all the Makefiles to get overwritten. As supporting all
platforms at once with cmake is going to be uncommitable, we are going
to need both methods able to live together for a while. Well, they can
coexist with this patch as long as cmake is not run from the root of
the code tree, which is acceptable for me as long as the switch is not
completed. However others may think differently on the matter.


It's really a good point. Forbidden run cmake from root it is better 
decision for me (of course for start). 


Instead of getting support for all existing platforms, I would
recommend as well focusing only on one platform for the time being,
Linux, and get the work done correctly for that first. Once there is
something committed, we will be able to patch the buildfarm, and get
machines to switch to cmake one by one. After those are switched, we
could extend that.


You mean in first version of patch I can focus on Linux systems?


Another point of contention is support for
extensions. How long should we keep support for the existing PGXS? How
external extensions would compile with the new thing infrastructure?


As long as possible. I hope I can make PGXS Makefiles generator.


Which brings me to another point, your patch is focused on features,
meaning that per-OS checks are all grouped by feature, but it may be a
good idea to split checks by OS if necessary, with for example
per-platform files and scripts in cmake/os/. And we could have just
something for Linux now.


Currently I do not have a lot OS specific tests. All checks are doing in 
same manner. 


- root's .gitignore needs to add entries for CMakeFiles and
cmake_install.cmake. You need as well to ignore CMakeCache


Thanks I done this in last commit. 


- A couple of headers are generated, like cubeparse.h (?)


Because BISON generate header by default. I suppose author of cube launched 
bison by hand but I made it automatic. 


- Currently a lot of users do things like that:
cd src/test/regress/ && make check
But this patch breaks that, and that's not cool. Recovery tests in
src/test/regress won't run either.


It seems restriction by design because in CMake you have only one enter 
point. 


That's all I have for now. Looking forward to seeing some progress here.


I merged master to my branch and I spent time to porting all changes. I 
hope send patch in the weekend without terrible flaws.


--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in to_timestamp().

2016-09-16 Thread Artur Zakirov


On 25.08.2016 13:26, amul sul wrote:

Thanks. I've created the entry in https://commitfest.postgresql.org/10/713/
. You can add yourself as a reviewer.



Done, added myself as reviewer & changed status to "Ready for
Committer". Thanks !

Regards,
Amul Sul




It seems there is no need to rebase patches. But I will not be able to 
fix patches for two weeks because of travel if someone will have a note 
or remark.


So it would be nice if someone will be able to fix possible issues for 
above reasone.


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why postgres take RowExclusiveLock on all partition

2016-09-16 Thread Sachin Kotwal

Hi Tom,

What I understood from this
https://www.postgresql.org/docs/9.5/static/explicit-locking.html#TABLE-LOCK-COMPATIBILITY
is :

The RowExclusiveLock conflicts with queries want SHARE, SHARE ROW EXCLUSIVE,
EXCLUSIVE ACCESS EXCLUSIVE locks.

In one of our customer environment we want do some DDL operation everyday
through cronjobs . This cronjobs get blocked by RowExclusiveLock lock taken
by UPDATE query.  And then lot more queries are waiting on this cronjob as
sqls under cronjob have hold ACCESS EXCLUSIVE on related tables  involved
in other select queries.

If we can not reduce locking in partition scenario, then it is fine. We can
consider this is limitation of PostgreSQL or any other RDBMS system.

Regards,
Sachin

On Fri, Sep 16, 2016 at 7:41 PM, Tom Lane  wrote:

> Sachin Kotwal  writes:
> > Does it release locks after taking decision and then perform actual
> update
> > operation on partition table?
>
> No, there's no attempt to do that, and we're unlikely to consider doing so
> because it would result in more lock-table thrashing.  Why do you care?
> RowExclusiveLock does not block any normal DML operation, so there's no
> apparent benefit from releasing it early.
>
> regards, tom lane
>

-- 

Thanks and Regards,
Sachin Kotwal

Re: [HACKERS] README of hash index

2016-09-16 Thread Jeff Janes

On Fri, Sep 16, 2016 at 4:20 AM, Amit Kapila 
wrote:

> Currently README of hash module contain algorithms written in below form.
>
> The insertion algorithm is rather similar:
>
> pin meta page and take buffer content lock in shared mode
> loop:
> compute bucket number for target hash key
> release meta page buffer content lock
> if (correct bucket page is already locked)
> break
> release any existing bucket page lock (if a concurrent split happened)
> take heavyweight bucket lock in shared mode
> retake meta page buffer content lock in shared mode
> -- (so far same as reader)
> release pin on metapage
> ..
> ..
>
> I have mostly updated them in the patches I have proposed to improve
> hash index.  However, each time I try to update them, I find that it
> is easy to follow the code than to read and understand the existing
> algorithm written in above form from README.
>
> Do others find it useful to maintain the algorithms in above form?
>

I think that having them all condensed into one place makes it easier to
think through the locking models to decide if there might be races or
deadlocks.  If you only care about the algorithm for inserting in
isolation, I agree reading the code might be better.

But the use of white space isn't always consistent, and I don't know what a
double hyphen means.  I think it could use improvement, rather than
abolishing.

Cheers,

Jeff

Re: [HACKERS] Hash Indexes

2016-09-16 Thread Jeff Janes

On Thu, Sep 15, 2016 at 7:23 AM, Andres Freund  wrote:

> Hi,
>
> On 2016-05-10 17:39:22 +0530, Amit Kapila wrote:
> > For making hash indexes usable in production systems, we need to improve
> > its concurrency and make them crash-safe by WAL logging them.
>
> One earlier question about this is whether that is actually a worthwhile
> goal.  Are the speed and space benefits big enough in the general case?
> Could those benefits not be achieved in a more maintainable manner by
> adding a layer that uses a btree over hash(columns), and adds
> appropriate rechecks after heap scans?
>
> Note that I'm not saying that hash indexes are not worthwhile, I'm just
> doubtful that question has been explored sufficiently.
>

I think that exploring it well requires good code.  If the code is good,
why not commit it?  I would certainly be unhappy to try to compare WAL
logged concurrent hash indexes to btree-over-hash indexes, if I had to wait
a few years for the latter to appear, and then dig up the patches for the
former and clean up the bitrot, and juggle multiple patch sets, in order to
have something to compare.

Cheers,

Jeff

Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-16 Thread Alex Ignatov



On 16.09.2016 16:50, Amit Kapila wrote:

On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov  wrote:

No it doesn't.
Paralleling neither sql function nor plpgsql:
Here is example :

ipdr=> show max_worker_processes ;
 max_worker_processes
--
 128
(1 row)
ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set force_parallel_mode=on;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET



Can you try by setting force_parallel_mode = off;?  I think it is
sending the whole function execution to worker due to
force_parallel_mode.




No changes:

ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET
ipdr=> set force_parallel_mode = off;
SET
ipdr=> select name,setting from pg_settings where name 
in('max_parallel_workers_per_gather',
ipdr(>
'min_parallel_relation_size',
ipdr(>'parallel_tuple_cost',
ipdr(>
'force_parallel_mode');
  name   | setting
-+-
 force_parallel_mode | off
 max_parallel_workers_per_gather | 128
 min_parallel_relation_size  | 0
 parallel_tuple_cost | 0
(4 rows)

ipdr=> explain (analyze,buffers) select count(*) from (select 
a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
   QUERY PLAN
-
 Aggregate  (cost=87702.33..87702.34 rows=1 width=8) (actual 
time=709.643..709.643 rows=1 loops=1)
   Buffers: shared hit=65015
   ->  Finalize HashAggregate  (cost=87364.49..87514.64 rows=15015 width=28) 
(actual time=706.382..708.456 rows=15015 loops=1)
 Group Key: test.a, test.b, test.c, test.d, test.e
 Buffers: shared hit=65015
 ->  Gather  (cost=85149.78..85299.93 rows=165165 width=20) (actual 
time=478.626..645.209 rows=180180 loops=1)
   Workers Planned: 11
   Workers Launched: 11
   Buffers: shared hit=65015
   ->  Partial HashAggregate  (cost=84149.78..84299.93 rows=15015 
width=20) (actual time=473.890..478.309 rows=15015 loops=12)
 Group Key: test.a, test.b, test.c, test.d, test.e
 Buffers: shared hit=63695
 ->  Parallel Seq Scan on test  (cost=0.00..72786.01 
rows=909101 width=20) (actual time=0.021..163.120 rows=83 loops=12)
   Buffers: shared hit=63695
 Planning time: 0.318 ms
 Execution time: 710.600 ms
(16 rows)


ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();
QUERY PLAN
--
 Result  (cost=0.00..0.26 rows=1 width=8) (actual time=4003.719..4003.720 
rows=1 loops=1)
   Buffers: shared hit=63869
 Planning time: 0.021 ms
 Execution time: 4003.769 ms
(4 rows)

auto_explain:
2016-09-16 18:02:29 MSK [29353]: [53-1] 
user=ipdr,db=ipdr,app=psql,client=[local] LOG:  duration: 4001.275 ms  plan:
Query Text: select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t
Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..163696.15 rows=1115 
width=20)
2016-09-16 18:02:29 MSK [29353]: [54-1] user=ipdr,db=ipdr,app=psql,client=[local] 
CONTEXT:  SQL statement "select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t"
PL/pgSQL function parallel_test_plpgsql() line 5 at SQL statement


ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();
QUERY PLAN
--
 Result  (cost=0.00..0.26 rows=1 width=8) (actual time=4497.820..4497.822 
rows=1 loops=1)
   Buffers: shared hit=63695
 Planning time: 0.023 ms
 Execution time: 4497.872 ms
(4 rows)

auto_explain:
2016-09-16 18:03:23 MSK [29353]: [57-1] 
user=ipdr,db=ipdr,app=psql,client=[local] LOG:  duration: 4497.050 ms  plan:
Query Text: select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t
Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..163696.15 rows=1115 
width=20)
2016-09-16 18:03:23 MSK [29353]: [58-1] u

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-16 Thread Marco Pfatschbacher

On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote:
> > the current implementation of PostmasterIsAlive() uses a pipe to
> > monitor the existence of the postmaster process.
> > One end of the pipe is held open in the postmaster, while the other end is
> > inherited to all the auxiliary and background processes when they fork.
> > This leads to multiple processes calling select(2), poll(2) and read(2)
> > on the same end of the pipe.
> > While this is technically perfectly ok, it has the unfortunate side
> > effect that it triggers an inefficient behaviour[0] in the select/poll
> > implementation on some operating systems[1]:
> > The kernel can only keep track of one pid per select address and
> > thus has no other choice than to wakeup(9) every process that
> > is waiting on select/poll.
> 
> Yikes, that's a pretty absurd implementation.

Not when you take into account that it's been written over 20 years ago ;) 

> Does openbsd's kqueue implementation have the same issue? There's
> http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com

Looks ok, but your milage may vary. I've seen strange subtle bugs
using kqeue..

struct kevent {
__uintptr_t ident;  /* identifier for this event */
short   filter; /* filter for event */
u_short flags;
u_int   fflags;
quad_t  data;
void*udata; /* opaque user data identifier */
};

> > Attached patch avoids the select contention by using a
> > separate pipe for each auxiliary and background process.
> 
> I'm quite unenthusiastic about forcing that many additional file
> descriptors onto the postmaster...

In our use case it's about 30.

> I'm not quite sure I understand why this an issue here - there shouldn't
> ever be events on this fd, so why is the kernel waking up all processes?
> It'd kinda makes sense it'd wake up all processes if there's one
> waiting, but ... ?

Every read is an event, and that's what PostmasterIsAlive does. 

   Marco


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-16 Thread Marco Pfatschbacher

On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > Very interesting.  Perhaps that is why NetBSD shows a speedup with the
> > kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
> > kqueue patch to perform better on large FreeBSD systems, it would also
> > be a solution to this problem.
> 
> I just noticed that kqueue appears to offer a solution to this problem,
> ie one of the things you can wait for is exit of another process (named
> by PID, looks like).  If that's portable to all kqueue platforms, then
> integrating a substitute for the postmaster death pipe might push that
> patch over the hump to being a net win.
 
That sounds plausible.
I could give this a try after I get back from my vacation :)

   Marco


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

2016-09-16 Thread Marco Pfatschbacher

On Thu, Sep 15, 2016 at 04:24:07PM -0400, Tom Lane wrote:
> Marco Pfatschbacher  writes:
> > the current implementation of PostmasterIsAlive() uses a pipe to
> > monitor the existence of the postmaster process.
> > One end of the pipe is held open in the postmaster, while the other end is
> > inherited to all the auxiliary and background processes when they fork.
> > This leads to multiple processes calling select(2), poll(2) and read(2)
> > on the same end of the pipe.
> > While this is technically perfectly ok, it has the unfortunate side
> > effect that it triggers an inefficient behaviour[0] in the select/poll
> > implementation on some operating systems[1]:
> > The kernel can only keep track of one pid per select address and
> > thus has no other choice than to wakeup(9) every process that
> > is waiting on select/poll.
> 
> That seems like a rather bad kernel bug.

It's a limitation that has been there since at least BSD4.4.
But yeah, I wished things were better. 

> > In our case the system had to wakeup ~3000 idle ssh processes
> > every time postgresql did call PostmasterIsAlive.
> 
> Uh, these are processes not even connected to the pipe in question?
> That's *really* a kernel bug.

The kernel does not know if they were selecting on that pipe,
because the only slot to keep that mapping has been already taken.
It's not that bad of a performance hit, If the system doesn't
many processes.

> > Attached patch avoids the select contention by using a
> > separate pipe for each auxiliary and background process.
> 
> I think this would likely move the performance problems somewhere else.
> In particular, it would mean that every postmaster child would inherit
> pipes leading to all the older children. 

I kept them at a minimum. (See ClosePostmasterPorts)

>  We could close 'em again
> I guess, but we have previously found that having to do things that
> way is a rather serious performance drag --- see the problems we had

I think closing a few files doesn't hurt too much, but I see your point.

> with POSIX named semaphores, here for instance:
> https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
> I really don't want the postmaster to be holding any per-child kernel
> resources.
> 
> It'd certainly be nice if we could find another solution besides
> the pipe-based one, but I don't think "more pipes" is the answer.
> There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG)
> when available; do the BSDen have anything like that?

Not that I know of.
But since the WalReceiver process seemed to be the one calling
PostmasterIsAlive way more often than the rest, maybe we could limit
the performance hit by not calling it on every received wal chunk?

Cheers,
   Marco


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL][PROTOTYPE] Individual options for each index column: Opclass options

2016-09-16 Thread Robert Haas

On Tue, May 24, 2016 at 10:06 AM, Nikolay Shaplov
 wrote:
> So adding options for opclass seems to be really good idea.
> To see how it works you should do the following:
>
> # create extension intarray ;
> # create table test (i int[]);
> # create table test2 (i int[]);
> # create index ON test USING GIST (i USING gist__intbig_ops WITH OPTIONS
> (sig_len_int=22) );
> # create index ON test2 USING GIST (i USING gist__intbig_ops WITH OPTIONS
> (sig_len_int=120) );

I think supporting syntax of this type would be a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] OpenSSL 1.1 breaks configure and more

2016-09-16 Thread Christoph Berg

Re: To Heikki Linnakangas 2016-09-15 
<20160915213406.2mjlhcg7px3sa...@msg.df7cb.de>
> > Can you elaborate? Are you saying that Debian 9 (strect) will not ship
> > OpenSSL 1.0.2 anymore, and will require using OpenSSL 1.1.0?
> 
> I thought that was the plan, but upon asking on #debian-devel, it
> seems it's not set yet. I'll ask the maintainers directly and report
> back.

The plan is to ship only OpenSSL 1.1 in Stretch. (The list of packages
not yet ported is enormous, though, so I'm not yet sure it will really
happen.)

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827061


Re: Tom Lane 2016-09-16 <17025.1473977...@sss.pgh.pa.us>
> > Here is the result of backporting the sum of the two patches on top of 
> > REL9_4_STABLE. Not sure if we need this, but if we do we can apply this 
> > patch.
> 
> If someone's done the legwork, I think we would be well advised to
> back-patch.  Maybe not bother with 9.1 though.

Thanks for the patch!

I just tried to apply it to 9.2. There was a conflict in configure.in which was
trivial to resolve.

Another conflict in contrib/pgcrypto/pgcrypto.c was not applicable
because the code doesn't seem to exist (didn't try very hard though).

Ignoring the contrib conflict, it still didn't compile:

/home/cbe/projects/postgresql/debian/9.2/build/../src/backend/libpq/be-secure.c:
 In function ‘secure_write’:
/home/cbe/projects/postgresql/debian/9.2/build/../src/backend/libpq/be-secure.c:342:17:
 error: dereferencing pointer to incomplete type ‘SSL {aka struct ssl_st}’
if (port->ssl->state != SSL_ST_OK)
 ^~
/home/cbe/projects/postgresql/debian/9.2/build/../src/backend/libpq/be-secure.c:342:28:
 error: ‘SSL_ST_OK’ undeclared (first use in this function)
if (port->ssl->state != SSL_ST_OK)
^

Christoph


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why postgres take RowExclusiveLock on all partition

2016-09-16 Thread Tom Lane

Sachin Kotwal  writes:
> Does it release locks after taking decision and then perform actual update
> operation on partition table?

No, there's no attempt to do that, and we're unlikely to consider doing so
because it would result in more lock-table thrashing.  Why do you care?
RowExclusiveLock does not block any normal DML operation, so there's no
apparent benefit from releasing it early.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-09-16 Thread Robert Haas

On Fri, Sep 16, 2016 at 9:45 AM, Jeevan Chalke
 wrote:
> 6. Per my understanding, I think checking for just aggregate function is
> enough as we are interested in whole aggregate result.

+1.

> Meanwhile I will
> check whether we need to find and check shippability of transition,
> combination and finalization functions or not.

I don't think that'd be appropriate.  The user is calling the
aggregate, not its constituent functions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-16 Thread Robert Haas

On Fri, Sep 16, 2016 at 9:47 AM, Pavan Deolasee
 wrote:
> On Fri, Sep 16, 2016 at 7:03 PM, Robert Haas  wrote:
>> On Thu, Sep 15, 2016 at 11:39 PM, Pavan Deolasee
>>  wrote:
>> > But I actually wonder if we are over engineering things and
>> > overestimating
>> > cost of memmove etc. How about this simpler approach:
>>
>> Don't forget that you need to handle the case where
>> maintenance_work_mem is quite small.
>
> How small? The default IIRC these days is 64MB and minimum is 1MB. I think
> we can do some special casing for very small values and ensure that things
> at the very least work and hopefully don't regress for them.

Sounds like you need to handle values as small as 1MB, then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why postgres take RowExclusiveLock on all partition

2016-09-16 Thread Sachin Kotwal

Hi Tom,

Thanks for reply.

To take decision it should get locks for very small interval.
Does it release locks after taking decision and then perform actual update
operation on partition table?
I feel update operation can take longer time than planner to examine and
will not require lock in later stage of query execution.

Locking all partition tables leads to blocking all queries(mostly select *
... kind of)  who want lock on other partition.
If we able to release lock immediately  after planner examination it will
help to get locks to other running queries on other partitions.

If will be happy we will be able to reduce locking in above scenario.

Regards,
Sachin

On Fri, Sep 16, 2016 at 7:16 PM, Tom Lane  wrote:

> Sachin Kotwal  writes:
> > In another Terminal :
>
> > postgres=# select locktype, database::regclass ,
> > relation::regclass,virtualtransaction, pid, mode , granted from pg_locks
> > where locktype='relation';
> >  locktype | database | relation | virtualtransaction |  pid  |   mode
> > | granted
> > --+--+--++--
> -+--+-
> >  relation | 13241| pg_locks | 3/3867 | 28635 |
> > AccessShareLock  | t
> >  relation | 13241| t1_p2| 2/14038| 28633 |
> > RowExclusiveLock | t
> >  relation | 13241| t1_p1| 2/14038| 28633 |
> > RowExclusiveLock | t
> >  relation | 13241| t1   | 2/14038| 28633 |
> > RowExclusiveLock | t
> > (4 rows)
>
> The planner must take some type of lock on each partition, because it
> has to examine that table and decide whether or not it needs to be
> scanned, and that at least requires locking the table's DDL state.
> So those locks will be there whether or not the query ultimately scans
> the tables.  This isn't a bug.
>
> regards, tom lane
>

-- 

Thanks and Regards,
Sachin Kotwal

Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server

2016-09-16 Thread Jeevan Chalke

On Mon, Sep 12, 2016 at 5:19 PM, Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Mon, Sep 12, 2016 at 12:20 PM, Prabhat Sahu <
> prabhat.s...@enterprisedb.com> wrote:
>
>> Hi,
>>
>> While testing "Aggregate pushdown", i found the below error:
>> -- GROUP BY alias showing different behavior after adding patch.
>>
>> -- Create table "t1", insert few records.
>> create table t1(c1 int);
>> insert into t1 values(10), (20);
>>
>> -- Create foreign table:
>> create foreign table f_t1 (c1 int) server db1_server options (table_name
>> 't1');
>>
>> -- with local table:
>> postgres=# select 2 a, avg(c1) from t1 group by a;
>>  a | avg
>> ---+-
>>  2 | 15.
>> (1 row)
>>
>> -- with foreign table:
>> postgres=# select 2 a, avg(c1) from f_t1 group by a;
>> ERROR:  aggregate functions are not allowed in GROUP BY
>> CONTEXT:  Remote SQL command: EXPLAIN SELECT 2, avg(c1) FROM public.t1
>> GROUP BY 2
>>
>>
>>
> Thanks for reporting this bug in *v1.patch Prabhat.
>
> I will have a look over this issue and will post a fix in next version.
>

To fix this issue, we need to make deparseConst() function aware of showtype
flag exactly as that of get_const_expr().  While deparsing Const in GROUP BY
clause, we need to show "::typename" so that it won't treat the constant
value
as a column position in the target list and rather treat it as constant
value.

Fixed this in earlier attached patch and added test-case too.

-- 
Jeevan B Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-16 Thread Amit Kapila

On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov  wrote:
> No it doesn't.
> Paralleling neither sql function nor plpgsql:
> Here is example :
>
> ipdr=> show max_worker_processes ;
>  max_worker_processes
> --
>  128
> (1 row)
> ipdr=> set max_parallel_workers_per_gather to 128;
> SET
> ipdr=> set force_parallel_mode=on;
> SET
> ipdr=> set min_parallel_relation_size =0;
> SET
> ipdr=> set parallel_tuple_cost=0;
> SET
>

Can you try by setting force_parallel_mode = off;?  I think it is
sending the whole function execution to worker due to
force_parallel_mode.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-16 Thread Pavan Deolasee

On Fri, Sep 16, 2016 at 7:03 PM, Robert Haas  wrote:

> On Thu, Sep 15, 2016 at 11:39 PM, Pavan Deolasee
>  wrote:
> > But I actually wonder if we are over engineering things and
> overestimating
> > cost of memmove etc. How about this simpler approach:
>
> Don't forget that you need to handle the case where
> maintenance_work_mem is quite small.
>
>
How small? The default IIRC these days is 64MB and minimum is 1MB. I think
we can do some special casing for very small values and ensure that things
at the very least work and hopefully don't regress for them.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Why postgres take RowExclusiveLock on all partition

2016-09-16 Thread Tom Lane

Sachin Kotwal  writes:
> In another Terminal :

> postgres=# select locktype, database::regclass ,
> relation::regclass,virtualtransaction, pid, mode , granted from pg_locks
> where locktype='relation';
>  locktype | database | relation | virtualtransaction |  pid  |   mode
> | granted
> --+--+--++---+--+-
>  relation | 13241| pg_locks | 3/3867 | 28635 |
> AccessShareLock  | t
>  relation | 13241| t1_p2| 2/14038| 28633 |
> RowExclusiveLock | t
>  relation | 13241| t1_p1| 2/14038| 28633 |
> RowExclusiveLock | t
>  relation | 13241| t1   | 2/14038| 28633 |
> RowExclusiveLock | t
> (4 rows)

The planner must take some type of lock on each partition, because it
has to examine that table and decide whether or not it needs to be
scanned, and that at least requires locking the table's DDL state.
So those locks will be there whether or not the query ultimately scans
the tables.  This isn't a bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-16 Thread Tom Lane

Ashutosh Bapat  writes:
>> I'd suggest that this is parallel to nodeToString() and therefore
>> (a) should be placed beside it,

> Done. Added it after nodeToString().

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why postgres take RowExclusiveLock on all partition

2016-09-16 Thread Sachin Kotwal

Hi Ashutosh,

Thanks for reply.

Below are my findings:


In 1 Terminal:

postgres=# create table t1 (a int, b int);
CREATE TABLE
postgres=# show constraint_exclusion ;
 constraint_exclusion
--
 partition
(1 row)
postgres=# create table t1_p1() inherits (t1);
CREATE TABLE
postgres=# alter table t1_p1 add constraint a_part check (a > 0 and a <
100);
ALTER TABLE
postgres=# create table t1_p2() inherits (t1);
CREATE TABLE
postgres=# alter table t1_p2 add constraint a_part check (a > 100 and a <
200);
ALTER TABLE
postgres=# insert into t1_p1 select i,i from generate_series(1, 5) i;
INSERT 0 5
postgres=# insert into t1_p2 select i,i from generate_series(101, 105) i;
INSERT 0 5
postgres=# select * from t1_p1;
 a | b
---+---
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 5 | 5
(5 rows)
postgres=# select * from t1_p2;
  a  |  b
-+-
 101 | 101
 102 | 102
 103 | 103
 104 | 104
 105 | 105
(5 rows)
postgres=# begin;
BEGIN
postgres=# update t1 set b=555 where a=101;
UPDATE 1


In another Terminal :

postgres=# select locktype, database::regclass ,
relation::regclass,virtualtransaction, pid, mode , granted from pg_locks
where locktype='relation';
 locktype | database | relation | virtualtransaction |  pid  |   mode
| granted
--+--+--++---+--+-
 relation | 13241| pg_locks | 3/3867 | 28635 |
AccessShareLock  | t
 relation | 13241| t1_p2| 2/14038| 28633 |
RowExclusiveLock | t
 relation | 13241| t1_p1| 2/14038| 28633 |
RowExclusiveLock | t
 relation | 13241| t1   | 2/14038| 28633 |
RowExclusiveLock | t
(4 rows)


Hope above findings will help you to understand problem.


Regards,
Sachin


On Fri, Sep 16, 2016 at 6:20 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Fri, Sep 16, 2016 at 4:31 PM, Sachin Kotwal 
> wrote:
> > Hi Hackers,
> >
> >
> > I checked if there  is update transaction on master table involved in
> > partition.
> > Postgresql takes  RowExclusiveLock on all partition tables.
> >
> > constraint exclusion is set to on.
>
> I checked this under the debugger and found that only the partitions
> which are scanned. The partitions excluded by constraints are not
> locked.
>
> postgres=# create table t1 (a int);
> CREATE TABLE
> postgres=# set constraint_exclusion to partition;
> SET
> postgres=# create table t1_p1() inherits (t1);
> CREATE TABLE
> postgres=# alter table t1_p1 add constraint a_part check (a > 0 and a <
> 100);
> ALTER TABLE
> postgres=# create table t1_p2() inherits (t1);
> CREATE TABLE
> postgres=# alter table t1_p2 add constraint a_part check (a > 100 and a <
> 200);
> ALTER TABLE
> postgres=# insert into t1_p1 select i from generate_series(1, 5) i;
> INSERT 0 5
> postgres=# insert into t1_p2 select i from generate_series(101, 105) i;
> INSERT 0 5
> postgres=# explain verbose select * from t1 where a > 100;
>  QUERY PLAN
> -
>  Append  (cost=0.00..41.88 rows=851 width=4)
>->  Seq Scan on public.t1  (cost=0.00..0.00 rows=1 width=4)
>  Output: t1.a
>  Filter: (t1.a > 100)
>->  Seq Scan on public.t1_p2  (cost=0.00..41.88 rows=850 width=4)
>  Output: t1_p2.a
>  Filter: (t1_p2.a > 100)
> (7 rows)
>
> postgres=# explain verbose update t1 set a = a where a > 100;
>   QUERY PLAN
> --
>  Update on public.t1  (cost=0.00..41.88 rows=851 width=10)
>Update on public.t1
>Update on public.t1_p2
>->  Seq Scan on public.t1  (cost=0.00..0.00 rows=1 width=10)
>  Output: t1.a, t1.ctid
>  Filter: (t1.a > 100)
>->  Seq Scan on public.t1_p2  (cost=0.00..41.88 rows=850 width=10)
>  Output: t1_p2.a, t1_p2.ctid
>  Filter: (t1_p2.a > 100)
> (9 rows)
>
> The RowExclusiveLock is taken in InitPlan(), which is called after the
> partitions have been excluded.
>
>  817│ foreach(l, resultRelations)
>  818│ {
>  819│ Index   resultRelationIndex =
> lfirst_int(l);
>  820│ Oid resultRelationOid;
>  821│ RelationresultRelation;
>  822│
>  823│ resultRelationOid =
> getrelid(resultRelationIndex, rangeTable);
>  824├>resultRelation =
> heap_open(resultRelationOid, RowExclusiveLock);
>  825│ InitResultRelInfo(resultRelInfo,
>  826│
>  resultRelation,
>  827│
> resultRelationIndex,
>  828│
> estate->es_instrument);
>  829│ resultRelInfo++;
>  830│ }
>
> It does lock the parent table, since inheritance allows to have rows
> in that table. If the constraints on that table are not enough to
> exclude it by conditions, it will be scanned.

Re: [HACKERS] WAL consistency check facility

2016-09-16 Thread Michael Paquier

On Fri, Sep 16, 2016 at 10:30 PM, Robert Haas  wrote:
> I don't think you have the right to tell Kuntal that he has to move
> the patch to the next CommitFest because there are unspecified things
> about the current version you don't like.  If you don't have time to
> review further, that's your call, but he can leave the patch as Needs
> Review and see if someone else has time.

No complain from here if done this way. I don't mean any offense :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: About CMake v2

2016-09-16 Thread Michael Paquier

On Fri, Sep 16, 2016 at 9:58 AM, Michael Paquier
 wrote:
> Okay. That sounds good to me. I donas well a look around, and cmake really 
> looks like a robust alternative to ./configure. Now I am aware of the fact 
> that your patch recommends to build 't recall what your patch is
> exactly doing but could you still keep the vanilla Makefiles around?
> This will reduce the diff of the patch, and we'd need anyway to keep
> the former Makefile methods around until the buildfarm scripts are
> updated, the animals do the switch and then get green. So a period
> where both live in parallel is unavoidable.
>
> I heard as well that MySQL is using cmake... It may be interesting to
> see how they are integrating with it as a large project and avoid
> their past mistakes.

I could not resist so I just had a look at your patch. I had as well a
look around, and cmake really looks like a robust alternative to
./configure. In short, people doing now that:
./configure
make
make install
Would just do that:
cmake .
make
make install

I am no cmake guru yet, but VPATH builds are supported out of the box,
which is cool.

Your patch recommends to build with cmake after creating build/, now I
would expect most users to run cmake from the root folder. However
this causes all the Makefiles to get overwritten. As supporting all
platforms at once with cmake is going to be uncommitable, we are going
to need both methods able to live together for a while. Well, they can
coexist with this patch as long as cmake is not run from the root of
the code tree, which is acceptable for me as long as the switch is not
completed. However others may think differently on the matter.

Instead of getting support for all existing platforms, I would
recommend as well focusing only on one platform for the time being,
Linux, and get the work done correctly for that first. Once there is
something committed, we will be able to patch the buildfarm, and get
machines to switch to cmake one by one. After those are switched, we
could extend that. Another point of contention is support for
extensions. How long should we keep support for the existing PGXS? How
external extensions would compile with the new thing infrastructure?

Which brings me to another point, your patch is focused on features,
meaning that per-OS checks are all grouped by feature, but it may be a
good idea to split checks by OS if necessary, with for example
per-platform files and scripts in cmake/os/. And we could have just
something for Linux now.

A couple of issues I have noticed with your patch after a first set of tests:
- root's .gitignore needs to add entries for CMakeFiles and
cmake_install.cmake. You need as well to ignore CMakeCache
- A couple of headers are generated, like cubeparse.h (?)
- Currently a lot of users do things like that:
cd src/test/regress/ && make check
But this patch breaks that, and that's not cool. Recovery tests in
src/test/regress won't run either.

That's all I have for now. Looking forward to seeing some progress here.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2016-09-16 Thread Robert Haas

On Thu, Sep 15, 2016 at 11:39 PM, Pavan Deolasee
 wrote:
> But I actually wonder if we are over engineering things and overestimating
> cost of memmove etc. How about this simpler approach:

Don't forget that you need to handle the case where
maintenance_work_mem is quite small.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL consistency check facility

2016-09-16 Thread Robert Haas

On Thu, Sep 15, 2016 at 9:23 PM, Michael Paquier
 wrote:
> On Thu, Sep 15, 2016 at 7:30 PM, Kuntal Ghosh
>  wrote:
>> Thoughts?
>
> There are still a couple of things that this patch makes me unhappy,
> particularly the handling of the GUC with the xlogreader flags. I am
> not sure if I'll be able to look at that again within the next couple
> of weeks, but please be sure that this is registered in the next
> commit fest. You could for example do that by changing the patch from
> "Returned with Feedback" to "Moved to next CF" in the commit fest app.
> Be sure as well to spend a couple of cycles in reviewing patches.
> Usually for one patch sent, that's one patch of equal difficulty to
> review, and there are many patch still waiting for feedback.

I don't think you have the right to tell Kuntal that he has to move
the patch to the next CommitFest because there are unspecified things
about the current version you don't like.  If you don't have time to
review further, that's your call, but he can leave the patch as Needs
Review and see if someone else has time.

You are right that he should review some other people's patches, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2016-09-16 Thread Robert Haas

On Thu, Sep 15, 2016 at 8:52 PM, Craig Ringer  wrote:
> On 2 September 2016 at 23:29, Petr Jelinek  wrote:
>
>> You could put it to txid.c where all the other txid stuff is in?
>
> Yeah, even though it's in adt/ I think it'll do.
>
> I thought I'd need get_xid_in_recent_past() for catalog_xmin hot
> standby feedback, but upon closer examination the needed logic isn't
> the same anymore. txid_status() wants to ensure clog lookups are safe
> and limit by oldest xid, wheras the walsender doesn't actually care
> about that and is just avoiding wrapped xids.
>
> I'm just going back to how it was, all in adt/txid.c, and making it
> static again. We can move it and make it non-static if a need to do so
> comes up.
>
> Attached rebased patch updated and vs master.

You appear to have attached the wrong patch set.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel sec scan in plpgsql

2016-09-16 Thread Alex Ignatov


No it doesn't.
Paralleling neither sql function nor plpgsql:
Here is example :

ipdr=> show max_worker_processes ;
 max_worker_processes
--
 128
(1 row)
ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set force_parallel_mode=on;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET

ipdr=> create table test as select (random ()*1000)::int % 3 as a,
ipdr-> (random ()*1000)::int % 5 as b,
ipdr-> (random ()*1000)::int % 7 as c,
ipdr-> (random ()*1000)::int % 11 as d,
ipdr-> (random ()*1000)::int % 13 as e,
ipdr-> (random ()*1000)::int % 17 as bytes
ipdr-> from generate_series(1,10*1000*1000);
SELECT 1000


ipdr=> create or replace function parallel_test_plpgsql() returns bigint as
ipdr-> $$
ipdr$> declare
ipdr$>cnt int:=0;
ipdr$> begin
ipdr$>select count(*) into cnt from (select a,b,c,d,e,sum(bytes) from test 
group by a,b,c,d,e)t;
ipdr$>return cnt;
ipdr$> end;
ipdr$> $$ language plpgsql PARALLEL SAFE  STRICT;
CREATE FUNCTION

ipdr=>
ipdr=> create or replace function parallel_test_sql() returns bigint as
ipdr-> $$
ipdr$>select count(*) from (select a,b,c,d,e,sum(bytes) from test group by 
a,b,c,d,e)t;
ipdr$> $$ language sql PARALLEL SAFE STRICT;
CREATE FUNCTION

ipdr=> analyze test;
ANALYZE
ipdr=> explain (analyze,buffers) select count(*) from (select 
a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;

   QUERY PLAN
-
 Aggregate  (cost=87702.33..87702.34 rows=1 width=8) (actual 
time=723.792..723.792 rows=1 loops=1)
   Buffers: shared hit=65015
   ->  Finalize HashAggregate  (cost=87364.49..87514.64 rows=15015 width=28) 
(actual time=720.496..722.589 rows=15015 loops=1)
 Group Key: test.a, test.b, test.c, test.d, test.e
 Buffers: shared hit=65015
 ->  Gather  (cost=85149.78..85299.93 rows=165165 width=20) (actual 
time=502.607..665.039 rows=180180 loops=1)
   Workers Planned: 11
   Workers Launched: 11
   Buffers: shared hit=65015
   ->  Partial HashAggregate  (cost=84149.78..84299.93 rows=15015 
width=20) (actual time=497.106..501.170 rows=15015 loops=12)
 Group Key: test.a, test.b, test.c, test.d, test.e
 Buffers: shared hit=63695
 ->  Parallel Seq Scan on test  (cost=0.00..72786.01 
rows=909101 width=20) (actual time=0.018..166.556 rows=83 loops=12)
   Buffers: shared hit=63695
 Planning time: 0.250 ms
 Execution time: 724.293 ms
(16 rows)

ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();
   QUERY PLAN

 Gather  (cost=1000.00..1000.26 rows=1 width=8) (actual time=4088.952..4088.956 
rows=1 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   Buffers: shared hit=64186
   ->  Result  (cost=0.00..0.26 rows=1 width=8) (actual time=4084.997..4084.999 
rows=1 loops=1)
 Buffers: shared hit=64149
 Planning time: 0.025 ms
 Execution time: 4100.026 ms
(9 rows)

Log from auto_explain:
2016-09-16 16:05:11 MSK [28209]: [1-1] user=,db=,app=,client= LOG:  duration: 
4082.517 ms  plan:
Query Text: select count(*)  from (select a,b,c,d,e,sum(bytes) 
from test group by a,b,c,d,e)t
Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
->  Seq Scan on test  (cost=0.00..163696.15 rows=1115 
width=20)


ipdr=> explain (analyze,buffers) select  parallel_test_sql();
   QUERY PLAN

 Gather  (cost=1000.00..1000.26 rows=1 width=8) (actual time=4256.830..4256.837 
rows=1 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   Buffers: shared hit=64132
   ->  Result  (cost=0.00..0.26 rows=1 width=8) (actual time=4252.401..4252.403 
rows=1 loops=1)
 Buffers: shared hit=64095
 Planning time: 0.151 ms
 Execution time: 4267.959 ms
(9 rows)

Log from auto_explain:
2016-09-16 16:22:03 MSK [731]: [1-1] user=,db=,app=,client= LOG:  duration: 
4249.851 ms  plan:
Query Text:
   select count(*) from (select a,b,c,d,e,sum(bytes) from test group by 
a,b,c,d,e)t;

Aggregate  (cost=289035.43..289035.44 rows=1 width=8)
  ->  HashAggregate  (cost=288697.59..288847.74 rows=150

Re: [HACKERS] Why postgres take RowExclusiveLock on all partition

2016-09-16 Thread Ashutosh Bapat

On Fri, Sep 16, 2016 at 4:31 PM, Sachin Kotwal  wrote:
> Hi Hackers,
>
>
> I checked if there  is update transaction on master table involved in
> partition.
> Postgresql takes  RowExclusiveLock on all partition tables.
>
> constraint exclusion is set to on.

I checked this under the debugger and found that only the partitions
which are scanned. The partitions excluded by constraints are not
locked.

postgres=# create table t1 (a int);
CREATE TABLE
postgres=# set constraint_exclusion to partition;
SET
postgres=# create table t1_p1() inherits (t1);
CREATE TABLE
postgres=# alter table t1_p1 add constraint a_part check (a > 0 and a < 100);
ALTER TABLE
postgres=# create table t1_p2() inherits (t1);
CREATE TABLE
postgres=# alter table t1_p2 add constraint a_part check (a > 100 and a < 200);
ALTER TABLE
postgres=# insert into t1_p1 select i from generate_series(1, 5) i;
INSERT 0 5
postgres=# insert into t1_p2 select i from generate_series(101, 105) i;
INSERT 0 5
postgres=# explain verbose select * from t1 where a > 100;
 QUERY PLAN
-
 Append  (cost=0.00..41.88 rows=851 width=4)
   ->  Seq Scan on public.t1  (cost=0.00..0.00 rows=1 width=4)
 Output: t1.a
 Filter: (t1.a > 100)
   ->  Seq Scan on public.t1_p2  (cost=0.00..41.88 rows=850 width=4)
 Output: t1_p2.a
 Filter: (t1_p2.a > 100)
(7 rows)

postgres=# explain verbose update t1 set a = a where a > 100;
  QUERY PLAN
--
 Update on public.t1  (cost=0.00..41.88 rows=851 width=10)
   Update on public.t1
   Update on public.t1_p2
   ->  Seq Scan on public.t1  (cost=0.00..0.00 rows=1 width=10)
 Output: t1.a, t1.ctid
 Filter: (t1.a > 100)
   ->  Seq Scan on public.t1_p2  (cost=0.00..41.88 rows=850 width=10)
 Output: t1_p2.a, t1_p2.ctid
 Filter: (t1_p2.a > 100)
(9 rows)

The RowExclusiveLock is taken in InitPlan(), which is called after the
partitions have been excluded.

 817│ foreach(l, resultRelations)
 818│ {
 819│ Index   resultRelationIndex =
lfirst_int(l);
 820│ Oid resultRelationOid;
 821│ RelationresultRelation;
 822│
 823│ resultRelationOid =
getrelid(resultRelationIndex, rangeTable);
 824├>resultRelation =
heap_open(resultRelationOid, RowExclusiveLock);
 825│ InitResultRelInfo(resultRelInfo,
 826│   resultRelation,
 827│
resultRelationIndex,
 828│
estate->es_instrument);
 829│ resultRelInfo++;
 830│ }

It does lock the parent table, since inheritance allows to have rows
in that table. If the constraints on that table are not enough to
exclude it by conditions, it will be scanned.

Am I missing something? It might help to have SQL commands you are
running. Also, can you please explain why do you think all the
partitions are locked in RowExclusiveLock mode.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] README of hash index

2016-09-16 Thread Kenneth Marshall

On Fri, Sep 16, 2016 at 04:50:53PM +0530, Amit Kapila wrote:
> Currently README of hash module contain algorithms written in below form.
> 
> The insertion algorithm is rather similar:
> 
> pin meta page and take buffer content lock in shared mode
> loop:
> compute bucket number for target hash key
> release meta page buffer content lock
> if (correct bucket page is already locked)
> break
> release any existing bucket page lock (if a concurrent split happened)
> take heavyweight bucket lock in shared mode
> retake meta page buffer content lock in shared mode
> -- (so far same as reader)
> release pin on metapage
> ..
> ..
> 
> I have mostly updated them in the patches I have proposed to improve
> hash index.  However, each time I try to update them, I find that it
> is easy to follow the code than to read and understand the existing
> algorithm written in above form from README.
> 
> Do others find it useful to maintain the algorithms in above form?
> 

Hi Amit,

Speaking for myself, I think it does help to have a text description
of the algorithm to use as a guide while trying to understand the code.
For beginners (me), it is not always obvious what a section of code is
doing.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Improvements in psql hooks for variables

2016-09-16 Thread Daniel Verite

 Hi,

Following the discussion on forbidding an AUTOCOMMIT off->on
switch mid-transaction [1], attached is a patch that let the hooks
return a boolean indicating whether a change is allowed.

Using the hooks, bogus assignments to built-in variables can
be dealt with more strictly.

For example, pre-patch behavior:

  =# \set ECHO errors
  =# \set ECHO on
  unrecognized value "on" for "ECHO"; assuming "none"
  =# \echo :ECHO
  on

which has two problems:
- we have to assume a value, even though we can't know what the user meant.
- after assignment, the user-visible value of the variable diverges from its
internal counterpart (pset.echo in this case).


Post-patch:
  =# \set ECHO errors
  =# \set ECHO on
  unrecognized value "on" for "ECHO"
  \set: error while setting variable
  =# \echo :ECHO
  errors

Both the internal pset.* state and the user-visible value are kept unchanged
is the input value is incorrect.

Concerning AUTOCOMMIT, autocommit_hook() could return false to forbid
a switch when the conditions are not met.


Another user-visible effect of the patch is that, using a bogus value
for a built-in variable on the command-line becomes a fatal error
that prevents psql to continue.
This is not directly intended by the patch but is a consequence
of SetVariable() failing.

Example:
  $ ./psql -vECHO=bogus
  unrecognized value "bogus" for "ECHO"
  psql: could not set variable "ECHO"
  $ echo $?
  1

The built-in vars concerned by the change are:

booleans: AUTOCOMMIT, ON_ERROR_STOP, QUIET, SINGLELINE, SINGLESTEP

non-booleans: ECHO, ECHO_HIDDEN, ON_ERROR_ROLLBACK, COMP_KEYWORD_CASE,
 HISTCONTROL, VERBOSITY, SHOW_CONTEXT

We could go further to close the gap between pset.* and the built-in
variables,
by changing how they're initialized and forbidding deletion as Tom
suggests in [2], but if there's negative feedback on the above changes,
I think we should hear it first.

[1]
https://www.postgresql.org/message-id/f2cb5838-0ee9-4fe3-acc0-df77aeb7d4c7%40mm
[2]
https://www.postgresql.org/message-id/4695.1473961140%40sss.pgh.pa.us


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index 7ce05fb..9dcfc0a 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -786,43 +786,59 @@ showVersion(void)
  * This isn't an amazingly good place for them, but neither is anywhere else.
  */
 
-static void
+/*
+ * Hook to set an internal flag from a user-supplied string value.
+ * If the syntax is correct, affect *flag and return true.
+ * Otherwise, keep *flag untouched and return false.
+ */
+static bool
+generic_boolean_hook(const char *newval, const char* varname, bool *flag)
+{
+   bool isvalid;
+   bool val = ParseVariableBool(newval, varname, &isvalid);
+   if (isvalid)
+   *flag = val;
+   return isvalid;
+}
+
+static bool
 autocommit_hook(const char *newval)
 {
-   pset.autocommit = ParseVariableBool(newval, "AUTOCOMMIT");
+   return generic_boolean_hook(newval, "AUTOCOMMIT", &pset.autocommit);
 }
 
-static void
+static bool
 on_error_stop_hook(const char *newval)
 {
-   pset.on_error_stop = ParseVariableBool(newval, "ON_ERROR_STOP");
+   return generic_boolean_hook(newval, "ON_ERROR_STOP", 
&pset.on_error_stop);
 }
 
-static void
+static bool
 quiet_hook(const char *newval)
 {
-   pset.quiet = ParseVariableBool(newval, "QUIET");
+   return generic_boolean_hook(newval, "QUIET", &pset.quiet);
 }
 
-static void
+static bool
 singleline_hook(const char *newval)
 {
-   pset.singleline = ParseVariableBool(newval, "SINGLELINE");
+   return generic_boolean_hook(newval, "SINGLELINE", &pset.singleline);
 }
 
-static void
+static bool
 singlestep_hook(const char *newval)
 {
-   pset.singlestep = ParseVariableBool(newval, "SINGLESTEP");
+   return generic_boolean_hook(newval, "SINGLESTEP", &pset.singlestep);
 }
 
-static void
+static bool
 fetch_count_hook(const char *newval)
 {
pset.fetch_count = ParseVariableNum(newval, -1, -1, false);
+   return true;
 }
 
-static void
+static bool
 echo_hook(const char *newval)
 {
if (newval == NULL)
@@ -837,39 +853,52 @@ echo_hook(const char *newval)
pset.echo = PSQL_ECHO_NONE;
else
{
-   psql_error("unrecognized value \"%s\" for \"%s\"; assuming 
\"%s\"\n",
-  newval, "ECHO", "none");
-   pset.echo = PSQL_ECHO_NONE;
+   psql_error("unrecognized value \"%s\" for \"%s\"\n",
+  newval, "ECHO");
+   return false;
}
+   return true;
 }
 
-static void
+static bool
 echo_hidden_hook(const char *newval)
 {
if (newval == NULL)
pset.echo_hidden = PSQL_ECHO_HIDDEN_OFF;
else if (pg_strcasecmp(newval, "noexec") == 0)
pset.echo_hidden = PSQL_ECHO_HIDDEN_NOEXEC;
-   else if (ParseV

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2016-09-16 Thread Rajkumar Raghuwanshi

On Fri, Sep 9, 2016 at 3:17 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Hi All,
>
> PFA the patch to support partition-wise joins for partitioned tables. The
> patch
> is based on the declarative parition support patches provided by Amit
> Langote
> on 26th August 2016.
>

I have applied declarative partitioning patches posted by Amit Langote on
26 Aug 2016 and then partition-wise-join patch,  getting below error while
make install.

../../../../src/include/nodes/relation.h:706: error: redefinition of
typedef ‘PartitionOptInfo’
../../../../src/include/nodes/relation.h:490: note: previous declaration of
‘PartitionOptInfo’ was here
make[4]: *** [gistbuild.o] Error 1
make[4]: Leaving directory `/home/edb/Desktop/edb_work/WO
RKDB/PG/postgresql/src/backend/access/gist'
make[3]: *** [gist-recursive] Error 2
make[3]: Leaving directory `/home/edb/Desktop/edb_work/WO
RKDB/PG/postgresql/src/backend/access'
make[2]: *** [access-recursive] Error 2
make[2]: Leaving directory `/home/edb/Desktop/edb_work/WO
RKDB/PG/postgresql/src/backend'
make[1]: *** [all-backend-recurse] Error 2
make[1]: Leaving directory `/home/edb/Desktop/edb_work/WO
RKDB/PG/postgresql/src'
make: *** [all-src-recurse] Error 2

PS : I am using - gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17)

Attached the patch for the fix of above error.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 1e9fed9..963b022 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -487,7 +487,7 @@ typedef enum RelOptKind
 	((reloptkind) == RELOPT_OTHER_MEMBER_REL || \
 	 (reloptkind) == RELOPT_OTHER_JOINREL)
 
-typedef struct PartitionOptInfo PartitionOptInfo;
+typedef struct PartitionOptInfo;
 
 typedef struct RelOptInfo
 {
@@ -561,7 +561,7 @@ typedef struct RelOptInfo
 	/*
 	 * TODO: Notice recursive usage of RelOptInfo.
 	 */
-	PartitionOptInfo	*part_info;
+	struct PartitionOptInfo	*part_info;
 
 	/* Set only for "other" base or join relations. */
 	Relids		parent_relids;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] README of hash index

2016-09-16 Thread Amit Kapila

Currently README of hash module contain algorithms written in below form.

The insertion algorithm is rather similar:

pin meta page and take buffer content lock in shared mode
loop:
compute bucket number for target hash key
release meta page buffer content lock
if (correct bucket page is already locked)
break
release any existing bucket page lock (if a concurrent split happened)
take heavyweight bucket lock in shared mode
retake meta page buffer content lock in shared mode
-- (so far same as reader)
release pin on metapage
..
..

I have mostly updated them in the patches I have proposed to improve
hash index.  However, each time I try to update them, I find that it
is easy to follow the code than to read and understand the existing
algorithm written in above form from README.

Do others find it useful to maintain the algorithms in above form?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Why postgres take RowExclusiveLock on all partition

2016-09-16 Thread Sachin Kotwal

Hi Hackers,


I checked if there  is update transaction on master table involved in
partition.
Postgresql takes  RowExclusiveLock on all partition tables.

constraint exclusion is set to on.

My question is  why it locks on all partition tables instead only one
partition tables where data is resides?


Feel free to ask if any further information is required .


-- 

Thanks and Regards,
Sachin Kotwal

Re: [HACKERS] Block level parallel vacuum WIP

2016-09-16 Thread Masahiko Sawada

On Thu, Sep 15, 2016 at 11:44 PM, Robert Haas  wrote:
> On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada  
> wrote:
>> I'm implementing this patch but I need to resolve the problem
>> regarding lock for extension by multiple parallel workers.
>> In parallel vacuum, multiple workers could try to acquire the
>> exclusive lock for extension on same relation.
>> Since acquiring the exclusive lock for extension by multiple workers
>> is regarded as locking from same locking group, multiple workers
>> extend fsm or vm at the same time and end up with error.
>> I thought that it might be involved with parallel update operation, so
>> I'd like to discuss about this in advance.
>
> Hmm, yeah.  This is one of the reasons why parallel queries currently
> need to be entirely read-only.  I think there's a decent argument that
> the relation extension lock mechanism should be entirely redesigned:
> the current system is neither particularly fast nor particularly
> elegant, and some of the services that the heavyweight lock manager
> provides, such as deadlock detection, are not relevant for relation
> extension locks.  I'm not sure if we should try to fix that right away
> or come up with some special-purpose hack for vacuum, such as having
> backends use condition variables to take turns calling
> visibilitymap_set().
>

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Printing bitmap objects in the debugger

2016-09-16 Thread Ashutosh Bapat

>
> I'd suggest that this is parallel to nodeToString() and therefore
> (a) should be placed beside it,

Done. Added it after nodeToString().

> (b) should be named like it,
> bmsToString() perhaps,

bmsToString() is fine. Used that name.

> and (c) should look more like it internally.
>

Done.

I have also added a declaration for this function in nodes.h after
definition of struct Bitmapset. WIthout this declaration compiler
gives warning "no previous declaration" of this function.

Tested it under the debugger

Breakpoint 1, make_join_rel (root=0x20cafb0, rel1=0x20e2998,
rel2=0x20dd2c0) at joinrels.c:664
(gdb) p bmsToString(rel1->relids)
$1 = 0x2102fd0 "(b 1 3)"
(gdb) p bmsToString(rel2->relids)
$2 = 0x2104bc0 "(b 4)"
(gdb) p bmsToString(joinrelids)
$3 = 0x2104fd8 "(b 1 3 4)"
(gdb) p bmsToString(joinrel->relids)
$4 = 0x2105998 "(b 1 3 4)"
(gdb) p bmsToString(joinrel->lateral_relids)
$5 = 0x2105db0 "(b)"
(gdb) p joinrel->lateral_relids
$6 = (Relids) 0x0

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 29b7712..e1bbcf7 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -3886,10 +3886,25 @@ outNode(StringInfo str, const void *obj)
 char *
 nodeToString(const void *obj)
 {
 	StringInfoData str;
 
 	/* see stringinfo.h for an explanation of this maneuver */
 	initStringInfo(&str);
 	outNode(&str, obj);
 	return str.data;
 }
+
+/*
+ * bmsToNode -
+ * 		returns the ascii representation of the Bitmapset as a palloc'd string
+ */
+char *
+bmsToString(const Bitmapset *bms)
+{
+	StringInfoData str;
+
+	/* see stringinfo.h for an explanation of this maneuver */
+	initStringInfo(&str);
+	outBitmapset(&str, bms);
+	return str.data;
+}
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 2f7efa8..b239b99 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -554,20 +554,21 @@ extern PGDLLIMPORT Node *newNodeMacroHolder;
 extern char *nodeToString(const void *obj);
 
 struct Bitmapset;/* not to include bitmapset.h here */
 struct StringInfoData;			/* not to include stringinfo.h here */
 extern void outNode(struct StringInfoData *str, const void *obj);
 extern void outToken(struct StringInfoData *str, const char *s);
 extern void outBitmapset(struct StringInfoData *str,
 			 const struct Bitmapset *bms);
 extern void outDatum(struct StringInfoData *str, uintptr_t value,
 		 int typlen, bool typbyval);
+extern char *bmsToString(const struct Bitmapset *bms);
 
 /*
  * nodes/{readfuncs.c,read.c}
  */
 extern void *stringToNode(char *str);
 extern struct Bitmapset *readBitmapset(void);
 extern uintptr_t readDatum(bool typbyval);
 extern bool *readBoolCols(int numCols);
 extern int *readIntCols(int numCols);
 extern Oid *readOidCols(int numCols);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] standalone backend PANICs during recovery

2016-09-16 Thread Bernd Helmle



--On 3. September 2016 um 02:05:00 -0400 Tom Lane  wrote:

>> Well, the "use case" was a crashed hot standby at a customer site (it
>> kept PANICing after restarting), where i decided to have a look through
>> the recovery process with gdb. The exact PANIC was
> 
>> 2016-03-29 15:12:40 CEST [16097]: [26-1] user=,db=,xid=0,vxid=1/0 FATAL:
>> unexpected GIN leaf action: 0
> 
> BTW, that didn't happen to be on big-endian hardware did it?

You're right, this was RHEL7 on a POWER7 machine.

-- 
Thanks

Bernd


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2016-09-16 Thread Pavel Stehule

2016-09-16 1:44 GMT+02:00 Craig Ringer :

> On 15 September 2016 at 19:31, Pavel Stehule 
> wrote:
>
> > b_expr enforces shift/reduce conflict :(
>
> No problem then. I just thought it'd be worth allowing more if it
> worked to do so.
>
> > I found other opened question - how we can translate empty tag to SQL
> value?
> > The Oracle should not to solve this question, but PostgreSQL does. Some
> > databases returns empty string.
>
> Oracle doesn't solve the problem? it ERRORs?
>

Oracle returns NULL. But there are not any difference between NULL and
empty string

Regards

Pavel


>
> > I prefer return a empty string - not null in this case.
>
> I agree, and that's consistent with how most XML is interpreted. XSLT
> for example considers  and  to be pretty much the same
> thing.
>
> >  The reason is simple
> > - Empty string is some information - and NULL is less information. When
> it
> > is necessary I can transform empty string to NULL - different direction
> is
> > not unique.
>
> Yep, I definitely agree. The only issue is if people want a DEFAULT to
> be applied for empty tags. But that's something they can do in a
> post-process pass easily enough, since XMLTABLE is callable as a
> subquery / WITH expression / etc.
>
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

Re: [HACKERS] Hash Indexes

2016-09-16 Thread Amit Kapila

On Thu, Sep 15, 2016 at 10:38 PM, Jesper Pedersen
 wrote:
> On 09/15/2016 02:03 AM, Amit Kapila wrote:
>>>
>>> Same thing here - where the fields involving the hash index aren't
>>> updated.
>>>
>>
>> Do you mean that for such cases also you see 40-60% gain?
>>
>
> No, UPDATEs are around 10-20% for our cases.
>

Okay, good to know.

>>
>> It might be useful to test with higher number of rows because with so
>> less data contention is not visible,
>
>
> Attached is a run with 1000 rows.
>

I think 1000 is also less, you probably want to run it for 100,000 or
more rows.  I suspect that the reason why you are seeing the large
difference between btree and hash index is that the range of values is
narrow and there may be many overflow pages.

>>
>
> I think for CHI is would be Robert's and others feedback. For WAL, there is
> [1].
>

I have fixed your feedback for WAL and posted the patch.  I think the
remaining thing to handle for Concurrent Hash Index patch is to remove
the usage of hashscan.c from code if no one objects to it, do let me
know if I am missing something here.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Stopping logical replication protocol

2016-09-16 Thread Vladimir Gordiychuk

>Have you had a chance to look at adding the tests we discussed?

Not yet. I plane do it on this weekends

2016-09-16 4:37 GMT+03:00 Craig Ringer :

> On 9 September 2016 at 12:03, Craig Ringer  wrote:
>
> > Setting "waiting on author" in CF per discussion of the need for tests.
>
> Have you had a chance to look at adding the tests we discussed?
>
> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

64 matches

Mail list logo