Re: Locking B-tree leafs immediately in exclusive mode

2018-07-27 Thread Alexander Korotkov
On Fri, Jul 27, 2018 at 6:11 PM Alexander Korotkov
 wrote:
> On Fri, Jul 27, 2018 at 12:30 AM Simon Riggs  wrote:
> > On 26 July 2018 at 20:59, Alexander Korotkov  
> > wrote:
> >
> > > Great, thank you!  So, I think the regression is demystified.  We can
> > > now conclude that on our benchmarks this patch doesn't cause
> > > performance regression larger than measurement error.  But in some
> > > cases it shows huge performance benefit.
> > >
> > > So, I'm going to commit this, if no objections.
> >
> > +1 to commit.
> >
> > What will the commit message be?
> >
> > For me, this is about reducing contention on index leaf page hotspots,
> > while at the same time reducing the path length of lock acquisition on
> > leaf pages
>
> So, reducing path length of lock acquisition is particular technical
> change made, while reducing contention on index leaf pages is a
> result.  I think that reducing path length of lock acquisition should
> be mentioned in title of commit message, while contention reduction
> should be mentioned in the body of commit message, because it's
> motivation of this commit.  If we would have release notes item for
> this commit, it should also mention contention reduction, because it's
> a user-visible effect of this commit.

So, I've pushed it.  Thanks to everybody for review.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-27 Thread Alexander Korotkov
On Fri, Jul 27, 2018 at 12:30 AM Simon Riggs  wrote:
> On 26 July 2018 at 20:59, Alexander Korotkov  
> wrote:
>
> > Great, thank you!  So, I think the regression is demystified.  We can
> > now conclude that on our benchmarks this patch doesn't cause
> > performance regression larger than measurement error.  But in some
> > cases it shows huge performance benefit.
> >
> > So, I'm going to commit this, if no objections.
>
> +1 to commit.
>
> What will the commit message be?
>
> For me, this is about reducing contention on index leaf page hotspots,
> while at the same time reducing the path length of lock acquisition on
> leaf pages

So, reducing path length of lock acquisition is particular technical
change made, while reducing contention on index leaf pages is a
result.  I think that reducing path length of lock acquisition should
be mentioned in title of commit message, while contention reduction
should be mentioned in the body of commit message, because it's
motivation of this commit.  If we would have release notes item for
this commit, it should also mention contention reduction, because it's
a user-visible effect of this commit.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-26 Thread Simon Riggs
On 26 July 2018 at 20:59, Alexander Korotkov  wrote:

> Great, thank you!  So, I think the regression is demystified.  We can
> now conclude that on our benchmarks this patch doesn't cause
> performance regression larger than measurement error.  But in some
> cases it shows huge performance benefit.
>
> So, I'm going to commit this, if no objections.

+1 to commit.

What will the commit message be?

For me, this is about reducing contention on index leaf page hotspots,
while at the same time reducing the path length of lock acquisition on
leaf pages

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-26 Thread Alexander Korotkov
Hi!

On Thu, Jul 26, 2018 at 1:31 PM Imai, Yoshikazu
 wrote:
> On Wed, July 25, 2018 at 0:28 AM, Alexander Korotkov wrote:> > On Tue, Jul 
> 10, 2018 at 4:39 PM 今井 良一  wrote:
> > I've reread the thread.  And I found that in my initial letter [1] I
> > forget to include index definition.
> > CREATE INDEX unordered_i_idx ON unordered (i);
> >
> > So, when you did experiments [2], did you define any index on
> > unordered table?  If not, then it's curious why there is any
> > performance difference at all.
> >
> > 1. 
> > https://www.postgresql.org/message-id/CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt%2BbSeaJworwFoig%40mail.gmail.com
> > 2. 
> > https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51189451%40g01jpexmbkw24
>
> My experiment[1][2] was totally wrong that I didn't create index in unordered 
> and duplicated case.
>
> On Mon, July 9, 2018 at 3:19 AM, I wrote:
> > # script_duplicated.sql
> > INSERT INTO unordered VALUES (1, 'abcdefghijklmnoprsqtuvwxyz');
> I also made mistake in duplicated.sql that I used unordered table in 
> duplicated experiment...
>
> I re-run the experiments.
> This time, I created indexes on unordered and duplicated table, and confirmed
> indexes are created in both master and patched by using \d commands in psql.
> I used AWS c5.18xlarge, and executed pgbench with 36 clients.
>
> # results with indexes created
> master,  ordered,Ave 252796 TPS (253122,252469)
> patched, ordered,Ave 314541 TPS (314011,315070)
> master,  unordered,  Ave 409591 TPS 
> (413901,412010,404912,404607,416494,405619)
> patched, unordered,  Ave 412765 TPS (409626,408578,416900,415955)
> master,  duplicated, Ave 44811 TPS (45139,44482)
> patched, duplicated, Ave 202325 TPS (201519,203130)
>
> The TPS of "master, ordered" and "patched, ordered" is similar to the TPS of 
> "master, key1 with 1 values"
> in previous experiment[3] (contention is concentrated), the TPS of "master, 
> unordered" and "patched, unordered"
> is similar to the TPS of "master, key1 with 100 values" in previous 
> experiment[3] (contention is dispersed).
> Therefore this experiment and previous experiment[3] asked for from Simon is 
> correct I think.
>
> The TPS of "patched, duplicated" was almost 4.5 times bigger than "master, 
> duplicated" one.
>
>
> About a small performance regression in previous experiments[1], I think it 
> is because of machine's performance.
> I don't have rational reason nor evidence, so I only describe what I thought 
> and did (TL;DR:).
>
> In this experiments, in order to increase machine's performance, I did 
> something like "machine warm" as below batch.
>
> ## DDL
> CREATE UNLOGGED TABLE improve_performance (i integer not null, value text not 
> null);
> CREATE INDEX improve_i_idx  on improve_performance (i);
>
> ## improve_performance.sql
> \set i random(1, 100)
> INSERT INTO improve_performance VALUES (:i, 'abcdefghijklmnopqrstuvwxyz');
>
> ## test batch
> port=$1
> client=$2
> j=$3
> tablenames_all=(ordered unordered duplicated ordered2)
> tablenames=(unordered)
>
> function cleaning() {
>   for tablename_ in ${tablenames_all[@]}; do
> psql -p ${port} -c "truncate table ${tablename_};"
>   done
>
>   psql -p ${port} -c "vacuum full analyze;"
> }
>
>
> pgbench -p ${port} -T 60 -P 1 -M prepared -f improve_performance.sql -c 72 -j 
> 36 postgres # do for warm
> cleaning
>
> for tablename in ${tablenames[@]}; do
>   for i in `seq 0 1`; do
> pgbench -p ${port} -T 60 -P 1 -M prepared -f script_${tablename}.sql -c 
> ${client} -j ${j} postgres`
> cleaning
>   done
> done
>
>
> When I created index on improve_performance table and executed test batch 
> against unordered table
> in patched version three times, results are below.
> 1. 379644,381409
> 2. 391089,391741
> 3. 383141,367851
>
> These results are little decreased compared to "patched, key1 with 100 
> values: 405115" (unordered)
> in my previous experiments[3].
>
> I thought if I execute pgbench for warm against table with index created, 
> warm hasn't be completely done.
> Because in experiments[3], I executed pgbench for warm against table without 
> index created.
>
> So I dropped index from improve_performance table and executed test batch 
> against patched two times, results are below.
> 1. 409626,408578,
> 2. 416900,415955,
>
> Even in the same test, performance differs largely, so a small regression can 
> be ignored I think.
> Of course, I wonder there were regressions in all cases at the experiments[2].
>
>
> [1] 
> https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51186E9D%40g01jpexmbkw24
> [2] 
> https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51189451@g01jpexmbkw24
> [3] 
> https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A5118C7C3%40g01jpexmbkw24

Great, thank you!  So, I think the regression is demystified.  We can
now conclude that on our benchmarks this patch doesn't cause
performance regression larger than measurement error.  But in 

RE: Locking B-tree leafs immediately in exclusive mode

2018-07-26 Thread Imai, Yoshikazu
On Wed, July 25, 2018 at 0:30 AM, Alexander Korotkov wrote:
> > Lefted tasks in my review is doing the regression tests.
> 
> Cool, thank you for review!
> 

I did "make check-world" in patched version and all tests were passed 
successfully!


Yoshikazu Imai





RE: Locking B-tree leafs immediately in exclusive mode

2018-07-26 Thread Imai, Yoshikazu
Hi.
On Wed, July 25, 2018 at 0:28 AM, Alexander Korotkov wrote:
> Hi!
> 
> On Tue, Jul 10, 2018 at 4:39 PM 今井 良一  wrote:
> > On 2018/07/10 20:36, Alexander Korotkov wrote:
> > > Thank you for the experiments!  It seems that there is real regression
> > > here...  BTW, which script were you using in this benchmark:
> > > script_unordered.sql or script_duplicated.sql?
> >
> > Sorry, I forgot to write that. I used script_unordered.sql.
> 
> I've reread the thread.  And I found that in my initial letter [1] I
> forget to include index definition.
> CREATE INDEX unordered_i_idx ON unordered (i);
> 
> So, when you did experiments [2], did you define any index on
> unordered table?  If not, then it's curious why there is any
> performance difference at all.
> 
> 1. 
> https://www.postgresql.org/message-id/CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt%2BbSeaJworwFoig%40mail.gmail.com
> 2. 
> https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51189451%40g01jpexmbkw24

My experiment[1][2] was totally wrong that I didn't create index in unordered 
and duplicated case.

On Mon, July 9, 2018 at 3:19 AM, I wrote:
> # script_duplicated.sql
> INSERT INTO unordered VALUES (1, 'abcdefghijklmnoprsqtuvwxyz');
I also made mistake in duplicated.sql that I used unordered table in duplicated 
experiment...

I re-run the experiments.
This time, I created indexes on unordered and duplicated table, and confirmed
indexes are created in both master and patched by using \d commands in psql.
I used AWS c5.18xlarge, and executed pgbench with 36 clients.

# results with indexes created
master,  ordered,Ave 252796 TPS (253122,252469) 
patched, ordered,Ave 314541 TPS (314011,315070)
master,  unordered,  Ave 409591 TPS (413901,412010,404912,404607,416494,405619) 
patched, unordered,  Ave 412765 TPS (409626,408578,416900,415955)
master,  duplicated, Ave 44811 TPS (45139,44482) 
patched, duplicated, Ave 202325 TPS (201519,203130)

The TPS of "master, ordered" and "patched, ordered" is similar to the TPS of 
"master, key1 with 1 values"
in previous experiment[3] (contention is concentrated), the TPS of "master, 
unordered" and "patched, unordered"
is similar to the TPS of "master, key1 with 100 values" in previous 
experiment[3] (contention is dispersed).
Therefore this experiment and previous experiment[3] asked for from Simon is 
correct I think.

The TPS of "patched, duplicated" was almost 4.5 times bigger than "master, 
duplicated" one.


About a small performance regression in previous experiments[1], I think it is 
because of machine's performance.
I don't have rational reason nor evidence, so I only describe what I thought 
and did (TL;DR:).

In this experiments, in order to increase machine's performance, I did 
something like "machine warm" as below batch.

## DDL
CREATE UNLOGGED TABLE improve_performance (i integer not null, value text not 
null);
CREATE INDEX improve_i_idx  on improve_performance (i);

## improve_performance.sql
\set i random(1, 100)
INSERT INTO improve_performance VALUES (:i, 'abcdefghijklmnopqrstuvwxyz');

## test batch
port=$1
client=$2
j=$3
tablenames_all=(ordered unordered duplicated ordered2)
tablenames=(unordered)

function cleaning() {
  for tablename_ in ${tablenames_all[@]}; do
psql -p ${port} -c "truncate table ${tablename_};"
  done

  psql -p ${port} -c "vacuum full analyze;"
}


pgbench -p ${port} -T 60 -P 1 -M prepared -f improve_performance.sql -c 72 -j 
36 postgres # do for warm
cleaning

for tablename in ${tablenames[@]}; do
  for i in `seq 0 1`; do
pgbench -p ${port} -T 60 -P 1 -M prepared -f script_${tablename}.sql -c 
${client} -j ${j} postgres`
cleaning
  done
done


When I created index on improve_performance table and executed test batch 
against unordered table
in patched version three times, results are below.
1. 379644,381409
2. 391089,391741
3. 383141,367851

These results are little decreased compared to "patched, key1 with 100 values: 
405115" (unordered)
in my previous experiments[3].

I thought if I execute pgbench for warm against table with index created, warm 
hasn't be completely done.
Because in experiments[3], I executed pgbench for warm against table without 
index created.

So I dropped index from improve_performance table and executed test batch 
against patched two times, results are below.
1. 409626,408578,
2. 416900,415955,

Even in the same test, performance differs largely, so a small regression can 
be ignored I think.
Of course, I wonder there were regressions in all cases at the experiments[2].


[1] 
https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51186E9D%40g01jpexmbkw24
[2] 
https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51189451@g01jpexmbkw24
[3] 
https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A5118C7C3%40g01jpexmbkw24


Yoshikazu Imai




Re: Locking B-tree leafs immediately in exclusive mode

2018-07-25 Thread Alexander Korotkov
On Wed, Jul 25, 2018 at 5:54 AM Imai, Yoshikazu
 wrote:
> On Mon, July 9, 2018 at 3:19 AM, Imai, Yoshikazu wrote:
> > I'm planning to do code review and send it in the next mail.
>
> Sorry for delaying the code review.
>
> I did the code review, and I think there are no logical wrongs
> with B-Tree.
>
> I tested integrity of B-Tree with amcheck just in case.
> I execute this test on 16-cores machine with 64GB mem.
>
> I run B-Tree insertion and deletion simultaneously and execute
> 'bt_index_parent_check' periodically.
> I also run B-Tree insertion and update simultaneously and execute
> 'bt_index_parent_check' periodically.
>
> ##  DDL
> create table mytab (val1 int, val2 int);
> create index mytabidx on mytab (val1, val2);
>
> ## script_integrity_check_insert.sql
> \set i random(1, 100)
> \set j random(1, 100)
> insert into mytab values (:i, :j);
>
> ## script_integrity_check_delete.sql
> \set i random(1, 100)
> \set j random(1, 100)
> delete from mytab where val1 = :i and val2 = :j
>
> ## script_integrity_check_update.sql
> \set i random(1, 100)
> \set j random(1, 100)
> \set k random(1, 100)
> \set l random(1, 100)
> update mytab set val1 = :k, val2 = :l where val1 = :i and val2 = :j
>
> ## commands(insert, delete, simultaneously executed)
> pgbench -T 60 -P 1 -M prepared -f script_integrity_check_insert.sql -c 2 
> postgres
> pgbench -T 60 -P 1 -M prepared -f script_integrity_check_delete.sql -c 2 
> postgres
>
> ## commands(insert, update, simultaneously executed)
> pgbench -T 60 -P 1 -M prepared -f script_integrity_check_insert.sql -c 2 
> postgres
> pgbench -T 60 -P 1 -M prepared -f script_integrity_check_update.sql -c 2 
> postgres
>
> ## commands(executed during above insert, delete, update)
> (psql) select bt_index_parent_check('mytabidx');
>
>
> Finally, I confirmed there are no integrity problems during B-Tree operation.
>
> Lefted tasks in my review is doing the regression tests.

Cool, thank you for review!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-25 Thread Alexander Korotkov
Hi!

On Wed, Jul 25, 2018 at 1:19 PM Simon Riggs  wrote:
> On 13 July 2018 at 03:14, Imai, Yoshikazu  
> wrote:
> > From an attached graph("some_contention_points_on_leaf_nodes.png"), as 
> > contention points dispersed, we can see that TPS is increased and TPS 
> > difference between master and patched version becomes smaller.
>
> So I think this clearly shows the drop in throughput when we have
> index contention and that this patch improves on that situation.
>
> In cases where we don't have contention, the patch doesn't cause a
> negative effect.
>
> So +1 from me!

Thank you, but Imai found another case [1], where patch causes a small
regression.  For now, it's not completely clear for me what test case
was used, because I forgot to add any indexes to initial specification
of this test case [2].  If regression will be confirmed, then it would
be nice to mitigate that.  And even if we wouldn't manage to mitigate
that and consider that as acceptable trade-off, then it would be nice
to at least explain it...

1. 
https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51189451%40g01jpexmbkw24
2. 
https://www.postgresql.org/message-id/CAPpHfds%3DWmjv%2Bu7S0peHN2zRrw4C%3DYySn-ZKddp4E7q8KQ18hQ%40mail.gmail.com--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-25 Thread Alexander Korotkov
Hi!

On Tue, Jul 10, 2018 at 4:39 PM 今井 良一  wrote:
> On 2018/07/10 20:36, Alexander Korotkov wrote:
> > Thank you for the experiments!  It seems that there is real regression
> > here...  BTW, which script were you using in this benchmark:
> > script_unordered.sql or script_duplicated.sql?
>
> Sorry, I forgot to write that. I used script_unordered.sql.

I've reread the thread.  And I found that in my initial letter [1] I
forget to include index definition.
CREATE INDEX unordered_i_idx ON unordered (i);

So, when you did experiments [2], did you define any index on
unordered table?  If not, then it's curious why there is any
performance difference at all.

1. 
https://www.postgresql.org/message-id/CAPpHfduAMDFMNYTCN7VMBsFg_hsf0GqiqXnt%2BbSeaJworwFoig%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/0F97FA9ABBDBE54F91744A9B37151A51189451%40g01jpexmbkw24

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-25 Thread Simon Riggs
On 13 July 2018 at 03:14, Imai, Yoshikazu  wrote:
> On Mon, July 9, 2018 at 5:25 PM, Simon Riggs wrote:
>> Please can you check insertion with the index on 2 keys
>> 1st key has 10,000 values
>> 2nd key has monotonically increasing value from last 1st key value
>>
>> So each session picks one 1st key value
>> Then each new INSERTion is a higher value of 2nd key
>> so 1,1, then 1,2 then 1,3 etc
>>
>> Probably easier to do this with a table like this
>>
>> CREATE UNLOGGED TABLE ordered2 (id integer, logdate timestamp default
>> now(), value text not null, primary key (id, logdate));
>>
>> # script_ordered2.sql
>> \set i random(1, 1)
>> INSERT INTO ordered2 (id, value) VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');
>>
>> Thanks
> I tried to do this, but I might be mistaken your intention, so please specify 
> if I am wrong.
>
> While script_ordered.sql supposes that there is one contention point on the 
> most right leaf node,
> script_ordered2.sql supposes that there are some contention points on some 
> leaf nodes, is it right?

Yes, that is right, thank you for testing.

> I experimented with key1 having 1 values, but there are no difference in 
> the results compared to unordered.sql one

That is an important result, thanks

> so I experimented with key1 having 1, 2, 3, 5, 10, and 100 values.

OK, good

> From an attached graph("some_contention_points_on_leaf_nodes.png"), as 
> contention points dispersed, we can see that TPS is increased and TPS 
> difference between master and patched version becomes smaller.

So I think this clearly shows the drop in throughput when we have
index contention and that this patch improves on that situation.

In cases where we don't have contention, the patch doesn't cause a
negative effect.

So +1 from me!

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: Locking B-tree leafs immediately in exclusive mode

2018-07-24 Thread Imai, Yoshikazu

On Mon, July 9, 2018 at 3:19 AM, Imai, Yoshikazu wrote:
> I'm planning to do code review and send it in the next mail.

Sorry for delaying the code review.

I did the code review, and I think there are no logical wrongs
with B-Tree.

I tested integrity of B-Tree with amcheck just in case.
I execute this test on 16-cores machine with 64GB mem.

I run B-Tree insertion and deletion simultaneously and execute 
'bt_index_parent_check' periodically.
I also run B-Tree insertion and update simultaneously and execute
'bt_index_parent_check' periodically.

##  DDL
create table mytab (val1 int, val2 int);
create index mytabidx on mytab (val1, val2);

## script_integrity_check_insert.sql
\set i random(1, 100)
\set j random(1, 100)
insert into mytab values (:i, :j);

## script_integrity_check_delete.sql
\set i random(1, 100)
\set j random(1, 100)
delete from mytab where val1 = :i and val2 = :j

## script_integrity_check_update.sql
\set i random(1, 100)
\set j random(1, 100)
\set k random(1, 100)
\set l random(1, 100)
update mytab set val1 = :k, val2 = :l where val1 = :i and val2 = :j

## commands(insert, delete, simultaneously executed)
pgbench -T 60 -P 1 -M prepared -f script_integrity_check_insert.sql -c 2 
postgres
pgbench -T 60 -P 1 -M prepared -f script_integrity_check_delete.sql -c 2 
postgres

## commands(insert, update, simultaneously executed)
pgbench -T 60 -P 1 -M prepared -f script_integrity_check_insert.sql -c 2 
postgres
pgbench -T 60 -P 1 -M prepared -f script_integrity_check_update.sql -c 2 
postgres

## commands(executed during above insert, delete, update)
(psql) select bt_index_parent_check('mytabidx');


Finally, I confirmed there are no integrity problems during B-Tree operation.

Lefted tasks in my review is doing the regression tests.


Yoshikazu Imai



RE: Locking B-tree leafs immediately in exclusive mode

2018-07-12 Thread Imai, Yoshikazu
On Mon, July 9, 2018 at 5:25 PM, Simon Riggs wrote:
> Please can you check insertion with the index on 2 keys
> 1st key has 10,000 values
> 2nd key has monotonically increasing value from last 1st key value
> 
> So each session picks one 1st key value
> Then each new INSERTion is a higher value of 2nd key
> so 1,1, then 1,2 then 1,3 etc
> 
> Probably easier to do this with a table like this
> 
> CREATE UNLOGGED TABLE ordered2 (id integer, logdate timestamp default
> now(), value text not null, primary key (id, logdate));
> 
> # script_ordered2.sql
> \set i random(1, 1)
> INSERT INTO ordered2 (id, value) VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');
> 
> Thanks
I tried to do this, but I might be mistaken your intention, so please specify 
if I am wrong.

While script_ordered.sql supposes that there is one contention point on the 
most right leaf node,
script_ordered2.sql supposes that there are some contention points on some leaf 
nodes, is it right?
I experimented with key1 having 1 values, but there are no difference in 
the results compared to unordered.sql one, so I experimented with key1 having 
1, 2, 3, 5, 10, and 100 values.
Also, If I created primary key, "ERROR:  duplicate key value violates unique 
constraint "ordered2_pkey" happened, so I created non-unique key.

#DDL
CREATE UNLOGGED TABLE ordered2 (id integer, logdate timestamp default now(), 
value text not null);
CREATE INDEX ordered2_key ON ordered2 (id, logdate);

# script_ordered2.sql
\set i random(1, 100)  #second value is 1, 2, 3, 5, 10, or 100
INSERT INTO ordered2 (id, value) VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');

# ordered2 results, key1 having 1, 2, 3, 5, 10, and 100 values
master,  key1 with 1 values:  236428
master,  key1 with 2 values:  292248
master,  key1 with 3 values:  340980
master,  key1 with 5 values:  362808
master,  key1 with 10 values: 379525
master,  key1 with 100 values: 405265

patched, key1 with 1 values:  295862
patched, key1 with 2 values:  339538
patched, key1 with 3 values:  355793
patched, key1 with 5 values:  371332
patched, key1 with 10 values: 387731
patched, key1 with 100 values: 405115


From an attached graph("some_contention_points_on_leaf_nodes.png"), as 
contention points dispersed, we can see that TPS is increased and TPS 
difference between master and patched version becomes smaller.


Yoshikazu Imai


Re: Locking B-tree leafs immediately in exclusive mode

2018-07-10 Thread 今井 良一
On 2018/07/10 20:36, Alexander Korotkov wrote:
> On Tue, Jul 10, 2018 at 2:19 PM Imai, Yoshikazu
>  wrote:
>> On Mon, July 9, 2018 at 4:44 PM, Alexander Korotkov wrote:
 Firstly, I did performance tests on 72-cores machines(AWS c5.18xlarge) 
 same as you did.
>>>
>>> OK.  But not that c5.18xlarge is 72-VCPU machine, which AFAIK is
>>> close to the performance of 36 physical cores.
>>
>> Thanks for pointing that. I referred to /proc/cpuinfo and understood there 
>> are 36 physical cores.
>>
>>> In this case it also looks like we observed 1% regression.  Despite 1%
>>> may seem to be very small, I think we should clarify whether it really
>>> exists.  I have at least two hypothesis about this.
>>>
>>> 1) There is no real regression, observed difference of TPS is less
>>> than error of measurements.  In order to check that we need to retry
>>> the experiment multiple times.  Also, if you run benchmark on master
>>> before patched version (or vice versa) you should also try to swap the
>>> order to make sure there is no influence of the order of benchmarks.
>>> 2) If we consider relation between TPS and number of clients, TPS is
>>> typically growing with increasing number of clients until reach some
>>> saturation value.  After the saturation value, there is some
>>> degradation of TPS.  If patch makes some latency lower, that my cause
>>> saturation to happen earlier.  In order to check that, we need run
>>> benchmarks with various number of clients and draw a graph: TPS
>>> depending on clients.
>>>
>>> So, may I ask you to make more experiments in order to clarify the
>>> observed regression?
>>
>> I experimented 2) with changing clients parameter with 18, 36, 54, 72.
>> While doing experiment, I realized that results of pgbench with 36 clients 
>> improve after executing pgbench with 72 clients.
>> I don't know why this occurs, but anyway, in this experiment, I executed 
>> pgbench with 72 clients before executing other pgbenchs. (e.g. -c 72, -c 18, 
>> -c 36, -c 54, -c 72)
>> I tested experiments to master and patched unorderly(e.g. master, patched, 
>> patched, master, master, patched, patched, master)
>>
>> # results of changing clients(18, 36, 54, 72 clients)
>> master, -c 18 -j 18:  Ave 400410 TPS (407615,393942,401845,398241)
>> master, -c 36 -j 36:  Ave 415616 TPS (411939,400742,424855,424926)
>> master, -c 54 -j 54:  Ave 378734 TPS (401646,354084,408044,351163)
>> master, -c 72 -j 72:  Ave 360864 TPS (367718,360029,366432,349277)
>> patched, -c 18 -j 18: Ave 393115 TPS (382854,396396,395530,397678)
>> patched, -c 36 -j 36: Ave 390328 TPS (376100,404873,383498,396840)
>> patched, -c 54 -j 54: Ave 364894 TPS (365533,373064,354250,366727)
>> patched, -c 72 -j 72: Ave 353982 TPS (355237,357601,345536,357553)
>>
>> It may seem saturation is between 18 and 36 clients, so I also experimented 
>> with 27 clients.
>>
>> # results of changing clients(27 clients)
>> master, -c 27 -j 27:  Ave 416756 (423512,424241,399241,420030) TPS
>> patched, -c 27 -j 27: Ave 413568 (410187,404291,420152,419640) TPS
>>
>> I created a graph and attached in this mail("detecting saturation.png").
>> Referring to a graph, patched version's saturation happens earlier than 
>> master's one as you expected.
>> But even the patched version's nearly saturated TPS value has small 
>> regression from the master's one, I think.
>>
>> Is there another experiments to do about this?
> 
> Thank you for the experiments!  It seems that there is real regression
> here...  BTW, which script were you using in this benchmark:
> script_unordered.sql or script_duplicated.sql?

Sorry, I forgot to write that. I used script_unordered.sql.


Yoshikazu Imai


Re: Locking B-tree leafs immediately in exclusive mode

2018-07-10 Thread Alexander Korotkov
On Tue, Jul 10, 2018 at 2:19 PM Imai, Yoshikazu
 wrote:
> On Mon, July 9, 2018 at 4:44 PM, Alexander Korotkov wrote:
> > > Firstly, I did performance tests on 72-cores machines(AWS c5.18xlarge) 
> > > same as you did.
> >
> > OK.  But not that c5.18xlarge is 72-VCPU machine, which AFAIK is
> > close to the performance of 36 physical cores.
>
> Thanks for pointing that. I referred to /proc/cpuinfo and understood there 
> are 36 physical cores.
>
> > In this case it also looks like we observed 1% regression.  Despite 1%
> > may seem to be very small, I think we should clarify whether it really
> > exists.  I have at least two hypothesis about this.
> >
> > 1) There is no real regression, observed difference of TPS is less
> > than error of measurements.  In order to check that we need to retry
> > the experiment multiple times.  Also, if you run benchmark on master
> > before patched version (or vice versa) you should also try to swap the
> > order to make sure there is no influence of the order of benchmarks.
> > 2) If we consider relation between TPS and number of clients, TPS is
> > typically growing with increasing number of clients until reach some
> > saturation value.  After the saturation value, there is some
> > degradation of TPS.  If patch makes some latency lower, that my cause
> > saturation to happen earlier.  In order to check that, we need run
> > benchmarks with various number of clients and draw a graph: TPS
> > depending on clients.
> >
> > So, may I ask you to make more experiments in order to clarify the
> > observed regression?
>
> I experimented 2) with changing clients parameter with 18, 36, 54, 72.
> While doing experiment, I realized that results of pgbench with 36 clients 
> improve after executing pgbench with 72 clients.
> I don't know why this occurs, but anyway, in this experiment, I executed 
> pgbench with 72 clients before executing other pgbenchs. (e.g. -c 72, -c 18, 
> -c 36, -c 54, -c 72)
> I tested experiments to master and patched unorderly(e.g. master, patched, 
> patched, master, master, patched, patched, master)
>
> # results of changing clients(18, 36, 54, 72 clients)
> master, -c 18 -j 18:  Ave 400410 TPS (407615,393942,401845,398241)
> master, -c 36 -j 36:  Ave 415616 TPS (411939,400742,424855,424926)
> master, -c 54 -j 54:  Ave 378734 TPS (401646,354084,408044,351163)
> master, -c 72 -j 72:  Ave 360864 TPS (367718,360029,366432,349277)
> patched, -c 18 -j 18: Ave 393115 TPS (382854,396396,395530,397678)
> patched, -c 36 -j 36: Ave 390328 TPS (376100,404873,383498,396840)
> patched, -c 54 -j 54: Ave 364894 TPS (365533,373064,354250,366727)
> patched, -c 72 -j 72: Ave 353982 TPS (355237,357601,345536,357553)
>
> It may seem saturation is between 18 and 36 clients, so I also experimented 
> with 27 clients.
>
> # results of changing clients(27 clients)
> master, -c 27 -j 27:  Ave 416756 (423512,424241,399241,420030) TPS
> patched, -c 27 -j 27: Ave 413568 (410187,404291,420152,419640) TPS
>
> I created a graph and attached in this mail("detecting saturation.png").
> Referring to a graph, patched version's saturation happens earlier than 
> master's one as you expected.
> But even the patched version's nearly saturated TPS value has small 
> regression from the master's one, I think.
>
> Is there another experiments to do about this?

Thank you for the experiments!  It seems that there is real regression
here...  BTW, which script were you using in this benchmark:
script_unordered.sql or script_duplicated.sql?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-10 Thread Alexander Korotkov
On Mon, Jul 9, 2018 at 8:18 PM Peter Geoghegan  wrote:
> On Mon, Jul 9, 2018 at 9:43 AM, Alexander Korotkov
>  wrote:
> > In this case it also looks like we observed 1% regression.  Despite 1%
> > may seem to be very small, I think we should clarify whether it really
> > exists.  I have at least two hypothesis about this.
> >
> > 1) There is no real regression, observed difference of TPS is less
> > than error of measurements.  In order to check that we need to retry
> > the experiment multiple times.  Also, if you run benchmark on master
> > before patched version (or vice versa) you should also try to swap the
> > order to make sure there is no influence of the order of benchmarks.
> > 2) If we consider relation between TPS and number of clients, TPS is
> > typically growing with increasing number of clients until reach some
> > saturation value.  After the saturation value, there is some
> > degradation of TPS.  If patch makes some latency lower, that my cause
> > saturation to happen earlier.  In order to check that, we need run
> > benchmarks with various number of clients and draw a graph: TPS
> > depending on clients.
> >
> > So, may I ask you to make more experiments in order to clarify the
> > observed regression?
>
> It would be nice to actually see script_duplicated.sql. I don't know
> exactly what the test case was.

BTW, contents of script_duplicated.sql was posted by Imai, Yoshikazu
in the message body:

> # script_duplicated.sql
> INSERT INTO unordered VALUES (1, 'abcdefghijklmnoprsqtuvwxyz');

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



RE: Locking B-tree leafs immediately in exclusive mode

2018-07-10 Thread Imai, Yoshikazu
On Mon, July 9, 2018 at 4:44 PM, Alexander Korotkov wrote:
> > Firstly, I did performance tests on 72-cores machines(AWS c5.18xlarge) same 
> > as you did.
> 
> OK.  But not that c5.18xlarge is 72-VCPU machine, which AFAIK is
> close to the performance of 36 physical cores.

Thanks for pointing that. I referred to /proc/cpuinfo and understood there are 
36 physical cores. 

> In this case it also looks like we observed 1% regression.  Despite 1%
> may seem to be very small, I think we should clarify whether it really
> exists.  I have at least two hypothesis about this.
> 
> 1) There is no real regression, observed difference of TPS is less
> than error of measurements.  In order to check that we need to retry
> the experiment multiple times.  Also, if you run benchmark on master
> before patched version (or vice versa) you should also try to swap the
> order to make sure there is no influence of the order of benchmarks.
> 2) If we consider relation between TPS and number of clients, TPS is
> typically growing with increasing number of clients until reach some
> saturation value.  After the saturation value, there is some
> degradation of TPS.  If patch makes some latency lower, that my cause
> saturation to happen earlier.  In order to check that, we need run
> benchmarks with various number of clients and draw a graph: TPS
> depending on clients.
> 
> So, may I ask you to make more experiments in order to clarify the
> observed regression?

I experimented 2) with changing clients parameter with 18, 36, 54, 72. 
While doing experiment, I realized that results of pgbench with 36 clients 
improve after executing pgbench with 72 clients.
I don't know why this occurs, but anyway, in this experiment, I executed 
pgbench with 72 clients before executing other pgbenchs. (e.g. -c 72, -c 18, -c 
36, -c 54, -c 72)
I tested experiments to master and patched unorderly(e.g. master, patched, 
patched, master, master, patched, patched, master)

# results of changing clients(18, 36, 54, 72 clients)
master, -c 18 -j 18:  Ave 400410 TPS (407615,393942,401845,398241) 
master, -c 36 -j 36:  Ave 415616 TPS (411939,400742,424855,424926)
master, -c 54 -j 54:  Ave 378734 TPS (401646,354084,408044,351163)
master, -c 72 -j 72:  Ave 360864 TPS (367718,360029,366432,349277)
patched, -c 18 -j 18: Ave 393115 TPS (382854,396396,395530,397678)
patched, -c 36 -j 36: Ave 390328 TPS (376100,404873,383498,396840)
patched, -c 54 -j 54: Ave 364894 TPS (365533,373064,354250,366727)
patched, -c 72 -j 72: Ave 353982 TPS (355237,357601,345536,357553)

It may seem saturation is between 18 and 36 clients, so I also experimented 
with 27 clients.

# results of changing clients(27 clients)
master, -c 27 -j 27:  Ave 416756 (423512,424241,399241,420030) TPS
patched, -c 27 -j 27: Ave 413568 (410187,404291,420152,419640) TPS

I created a graph and attached in this mail("detecting saturation.png").
Referring to a graph, patched version's saturation happens earlier than 
master's one as you expected.
But even the patched version's nearly saturated TPS value has small regression 
from the master's one, I think.

Is there another experiments to do about this?


Yoshikazu Imai




Re: Locking B-tree leafs immediately in exclusive mode

2018-07-09 Thread Peter Geoghegan
On Mon, Jul 9, 2018 at 3:23 PM, Alexander Korotkov
 wrote:
> I'm sorry, but I didn't understand this guess.  I agree that moving
> right within _bt_findinsertloc() might be worse than moving right
> within _bt_moveright().  But why should it happen more often, if both
> with and without patch that happens only after _bt_moveright() in
> exclusive mode (with patch _bt_search() calls _bt_moveright() in
> exclusive mode)?

_bt_moveright() doesn't serialize access, because it doesn't couple
buffer locks. I think that it's possible that it's slower for that
reason -- not because it moves right more often.

"The Convoy Phenomenon" research report may be worth a read sometime,
though it's very old. It's co-authored by Jim Gray. As they put it,
sometimes "a non-FIFO strategy" can be faster. In simple terms,
sometimes it can be faster to randomly grant a lock, since in practice
nobody gets starved out. I'm guessing (and it is very much a guess)
that it could be something like that, since the behavior of
_bt_findinsertloc() can be FIFO-like, whereas the behavior of
_bt_moveright() may not be.

Again, the actual queries would help a lot. It's just a guess.

-- 
Peter Geoghegan



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-09 Thread Alexander Korotkov
On Mon, Jul 9, 2018 at 8:18 PM Peter Geoghegan  wrote:
> On Mon, Jul 9, 2018 at 9:43 AM, Alexander Korotkov
>  wrote:
> > In this case it also looks like we observed 1% regression.  Despite 1%
> > may seem to be very small, I think we should clarify whether it really
> > exists.  I have at least two hypothesis about this.
> >
> > 1) There is no real regression, observed difference of TPS is less
> > than error of measurements.  In order to check that we need to retry
> > the experiment multiple times.  Also, if you run benchmark on master
> > before patched version (or vice versa) you should also try to swap the
> > order to make sure there is no influence of the order of benchmarks.
> > 2) If we consider relation between TPS and number of clients, TPS is
> > typically growing with increasing number of clients until reach some
> > saturation value.  After the saturation value, there is some
> > degradation of TPS.  If patch makes some latency lower, that my cause
> > saturation to happen earlier.  In order to check that, we need run
> > benchmarks with various number of clients and draw a graph: TPS
> > depending on clients.
> >
> > So, may I ask you to make more experiments in order to clarify the
> > observed regression?
>
> It would be nice to actually see script_duplicated.sql. I don't know
> exactly what the test case was.
>
> Here is my wild guess: You may end up moving right more often within
> _bt_findinsertloc(), which is actually worse than moving right within
> _bt_moveright(), even when you _bt_moveright() in exclusive mode.
> _bt_findinsertloc() couples/crabs exclusive buffer locks because the
> unique case requires it, even when we're not inserting into a unique
> index. Whereas _bt_moveright() holds at most one buffer lock at a
> time.

I'm sorry, but I didn't understand this guess.  I agree that moving
right within _bt_findinsertloc() might be worse than moving right
within _bt_moveright().  But why should it happen more often, if both
with and without patch that happens only after _bt_moveright() in
exclusive mode (with patch _bt_search() calls _bt_moveright() in
exclusive mode)?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-09 Thread Simon Riggs
On 9 July 2018 at 04:18, Imai, Yoshikazu  wrote:

>> # script_ordered.sql
>> INSERT INTO ordered (value) VALUES ('abcdefghijklmnoprsqtuvwxyz');
>>
>> # script_unordered.sql
>> \set i random(1, 100)
>> INSERT INTO unordered VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');


>> # results
>> ordered, master: 157473 TPS
>> ordered, patched 231374 TPS
>> unordered, master: 232372 TPS
>> unordered, patched: 232535 TPS
>
> # my results
> ordered, master: 186045 TPS
> ordered, patched:265842 TPS
> unordered, master:   313011 TPS
> unordered, patched:  309636 TPS
>
> I confirmed that this patch improves ordered insertion.

Looks good

Please can you check insertion with the index on 2 keys
1st key has 10,000 values
2nd key has monotonically increasing value from last 1st key value

So each session picks one 1st key value
Then each new INSERTion is a higher value of 2nd key
so 1,1, then 1,2 then 1,3 etc

Probably easier to do this with a table like this

CREATE UNLOGGED TABLE ordered2 (id integer, logdate timestamp default
now(), value text not null, primary key (id, logdate));

# script_ordered2.sql
\set i random(1, 1)
INSERT INTO ordered2 (id, value) VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');

Thanks

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-09 Thread Peter Geoghegan
On Mon, Jul 9, 2018 at 9:43 AM, Alexander Korotkov
 wrote:
> In this case it also looks like we observed 1% regression.  Despite 1%
> may seem to be very small, I think we should clarify whether it really
> exists.  I have at least two hypothesis about this.
>
> 1) There is no real regression, observed difference of TPS is less
> than error of measurements.  In order to check that we need to retry
> the experiment multiple times.  Also, if you run benchmark on master
> before patched version (or vice versa) you should also try to swap the
> order to make sure there is no influence of the order of benchmarks.
> 2) If we consider relation between TPS and number of clients, TPS is
> typically growing with increasing number of clients until reach some
> saturation value.  After the saturation value, there is some
> degradation of TPS.  If patch makes some latency lower, that my cause
> saturation to happen earlier.  In order to check that, we need run
> benchmarks with various number of clients and draw a graph: TPS
> depending on clients.
>
> So, may I ask you to make more experiments in order to clarify the
> observed regression?

It would be nice to actually see script_duplicated.sql. I don't know
exactly what the test case was.

Here is my wild guess: You may end up moving right more often within
_bt_findinsertloc(), which is actually worse than moving right within
_bt_moveright(), even when you _bt_moveright() in exclusive mode.
_bt_findinsertloc() couples/crabs exclusive buffer locks because the
unique case requires it, even when we're not inserting into a unique
index. Whereas _bt_moveright() holds at most one buffer lock at a
time.

-- 
Peter Geoghegan



Re: Locking B-tree leafs immediately in exclusive mode

2018-07-09 Thread Alexander Korotkov
Hi!

On Mon, Jul 9, 2018 at 6:19 AM Imai, Yoshikazu
 wrote:
> Hi, I'm reviewing this.

Great.  Thank you for giving attention to this patch.

> Firstly, I did performance tests on 72-cores machines(AWS c5.18xlarge) same 
> as you did.

OK.  But not that c5.18xlarge is 72-VCPU machine, which AFAIK is close
to the performance of 36 physical cores.

> > # results
> > ordered, master: 157473 TPS
> > ordered, patched 231374 TPS
> > unordered, master: 232372 TPS
> > unordered, patched: 232535 TPS
>
> # my results
> ordered, master: 186045 TPS
> ordered, patched:265842 TPS
> unordered, master:   313011 TPS
> unordered, patched:  309636 TPS
>
> I confirmed that this patch improves ordered insertion.

Good, but it seems like 1% regression in unordered case.

> In addition to your tests, I did the following tests with many duplicate 
> entries
>
> #  DDL
> CREATE UNLOGGED TABLE duplicated (i integer not null, value text not null);
>
> # script_duplicated.sql
> INSERT INTO unordered VALUES (1, 'abcdefghijklmnoprsqtuvwxyz');
>
> # commands
> pgbench -T 60 -P 1 -M prepared -f script_duplicated.sql -c 150 -j 150 postgres
>
> # results
> duplicated, master:  315450 TPS
> duplicated, patched: 311584 TPS
>
> It seems there are almostly no performance degression in case of many 
> duplicated entries.

In this case it also looks like we observed 1% regression.  Despite 1%
may seem to be very small, I think we should clarify whether it really
exists.  I have at least two hypothesis about this.

1) There is no real regression, observed difference of TPS is less
than error of measurements.  In order to check that we need to retry
the experiment multiple times.  Also, if you run benchmark on master
before patched version (or vice versa) you should also try to swap the
order to make sure there is no influence of the order of benchmarks.
2) If we consider relation between TPS and number of clients, TPS is
typically growing with increasing number of clients until reach some
saturation value.  After the saturation value, there is some
degradation of TPS.  If patch makes some latency lower, that my cause
saturation to happen earlier.  In order to check that, we need run
benchmarks with various number of clients and draw a graph: TPS
depending on clients.

So, may I ask you to make more experiments in order to clarify the
observed regression?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



RE: Locking B-tree leafs immediately in exclusive mode

2018-07-08 Thread Imai, Yoshikazu
On Mon, June 11, 2018 at 4:31 PM, Alexander Korotkov wrote:
> On Mon, Jun 11, 2018 at 1:06 PM Simon Riggs  wrote:
> > On 5 June 2018 at 14:45, Alexander Korotkov 
> > wrote:
> > > Currently _bt_search() always locks leaf buffer in shared mode (aka
> > > BT_READ), while caller can relock it later.  However, I don't see
> > > what prevents
> > > _bt_search()
> > > from locking leaf immediately in exclusive mode (aka BT_WRITE) when
> > > required.
> > > When traversing downlink from non-leaf page of level 1 (lowest level
> > > of non-leaf pages just prior to leaf pages), we know that next page
> > > is going to be leaf.  In this case, we can immediately lock next page
> > > in exclusive mode if required.
> > > For sure, it might happen that we didn't manage to exclusively lock
> > > leaf in this way when _bt_getroot() points us to leaf page.  But
> > > this case could be handled in _bt_search() by relock.  Please, find
> > > implementation of this approach in the attached patch.
> >
> > It's a good idea. How does it perform with many duplicate entries?
> 
> Our B-tree is currently maintaining duplicates unordered.  So, during
> insertion we can traverse rightlinks in order to find page, which would
> fit new index tuple.
> However, in this case we're traversing pages in exclusive lock mode, and
> that happens already after re-lock.  _bt_search() doesn't do anything
> with that.
> So, I assume there shouldn't be any degradation in the case of many
> duplicate entries.
> 
> But this case definitely worth testing, and I'm planning to do it.
> 


Hi, I'm reviewing this.

Firstly, I did performance tests on 72-cores machines(AWS c5.18xlarge) same as 
you did.

> # postgresql.conf
> max_connections = 300
> shared_buffers = 32GB
> fsync = off
> synchronous_commit = off
> 
> 
> 
> #  DDL:
> CREATE UNLOGGED TABLE ordered (id serial primary key, value text not null);
> CREATE UNLOGGED TABLE unordered (i integer not null, value text not null);
> 
> 
> 
> # script_ordered.sql
> INSERT INTO ordered (value) VALUES ('abcdefghijklmnoprsqtuvwxyz');
> 
> 
> 
> # script_unordered.sql
> \set i random(1, 100)
> INSERT INTO unordered VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz');
> 
> 
> 
> # commands
> pgbench -T 60 -P 1 -M prepared -f script_ordered.sql -c 150 -j 150 postgres
> pgbench -T 60 -P 1 -M prepared -f script_unordered.sql -c 150 -j 150 postgres
> 
> 
> 
> # results
> ordered, master: 157473 TPS
> ordered, patched 231374 TPS
> unordered, master: 232372 TPS
> unordered, patched: 232535 TPS

# my results
ordered, master: 186045 TPS
ordered, patched:265842 TPS
unordered, master:   313011 TPS
unordered, patched:  309636 TPS

I confirmed that this patch improves ordered insertion.


In addition to your tests, I did the following tests with many duplicate entries

#  DDL
CREATE UNLOGGED TABLE duplicated (i integer not null, value text not null);

# script_duplicated.sql
INSERT INTO unordered VALUES (1, 'abcdefghijklmnoprsqtuvwxyz');

# commands
pgbench -T 60 -P 1 -M prepared -f script_duplicated.sql -c 150 -j 150 postgres

# results
duplicated, master:  315450 TPS
duplicated, patched: 311584 TPS

It seems there are almostly no performance degression in case of many 
duplicated entries.


I'm planning to do code review and send it in the next mail.

Yoshikazu Imai




Re: Locking B-tree leafs immediately in exclusive mode

2018-06-14 Thread Alexander Korotkov
On Thu, Jun 14, 2018 at 6:32 PM Peter Geoghegan  wrote:
> On Thu, Jun 14, 2018 at 5:43 AM, Alexander Korotkov
>  wrote:
> > However, that doesn't
> > look like inevitable shortcoming, because we could store heap TID in
> > t_tid of pivot index tuples.
>
> But the offset doesn't have enough space for an entire TID.

Sorry, my bad.  It slipped my mind that we use t_tid.ip_blkid of pivot
tuples to store downlinks :)
So, yes, additional attribute is required to store heap TID in pivot tuple.

Anyway, I'm looking forward seeing your patch posted, if even it would
be not yet perfect shape.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-06-14 Thread Peter Geoghegan
On Thu, Jun 14, 2018 at 5:43 AM, Alexander Korotkov
 wrote:
> Could you, please, clarify what do you mean by "fan-in"?  As I
> understood, higher "fan-in" means more children on single non-leaf
> page, and in turn "better branching".  Is my understanding correct?

Yes, your understanding is correct.

> If my understanding is correct, then making leaf level truly unique
> without suffix truncation will kill fan-in, because additional heap
> TID attribute will increase pivot tuple size.

You're right that that's theoretically possible. However, in practice
we can almost always truncate away the heap TID attribute from pivot
tuples. We only need the heap TID attribute when there is no other
distinguishing attribute.

> However, that doesn't
> look like inevitable shortcoming, because we could store heap TID in
> t_tid of pivot index tuples.

But the offset doesn't have enough space for an entire TID.

> So, my idea that it's not necessary to couple suffix truncation with
> making leaf tuples unique.

It seems like by far the simplest way. I don't really care about
suffix truncation, though. I care about making the leaf tuples unique.
Suffix truncation can help.

> Regarding just making leaf tuples unique,
> I understand that it wouldn't be always win.  When we have a lot of
> duplicates, current implementation searches among the pages containing
> those duplicates for the first page containing enough of free space.
> While with unique index, we would have to always insert into
> particular page.  Thus, current implementation gives us better filling
> of pages, and (probably) faster insertion.

You're definitely right that that's the weakness of the design.

>  But, unique index would
> give us much better performance of retail tuple delete and update
> (currently we don't support both).  But I believe that future
> pluggable table access methods will use both.

Right.

> Therefore, my idea is that there is a tradeoff between making btree
> index unique or non-unique.  I think we will need an option for that.
> I'm not yet sure if this option should be user-visible or
> auto-decision based on table access method used.

I agree.

-- 
Peter Geoghegan



Re: Locking B-tree leafs immediately in exclusive mode

2018-06-14 Thread Alexander Korotkov
On Thu, Jun 14, 2018 at 4:56 PM Claudio Freire  wrote:
> Not at all. Insertion cost in unique indexes with lots of duplicates
> (happens, dead duplicates) grows quadratically on the number of
> duplicates, and that's improved by making the index unique and sorted.

Sorry, I've messed up the terms.  I did actually compare current
non-unique indexes with non-unique indexes keeping duplicate entries
ordered by TID (which makes them somewhat unique).  I didn't really
considered indexes, which forces unique constraints.  For them
insertion cost grows quadratically (as you mentioned) independently on
whether we're keeping duplicates ordered by TID or not.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-06-14 Thread Claudio Freire
On Thu, Jun 14, 2018 at 9:44 AM Alexander Korotkov
 wrote:
> > > Our B-tree is currently maintaining duplicates unordered.  So, during 
> > > insertion
> > > we can traverse rightlinks in order to find page, which would fit new
> > > index tuple.
> > > However, in this case we're traversing pages in exclusive lock mode, and
> > > that happens already after re-lock.  _bt_search() doesn't do anything 
> > > with that.
> > > So, I assume there shouldn't be any degradation in the case of many
> > > duplicate entries.
> >
> > BTW, I have a prototype patch that makes the keys at the leaf level
> > unique. It is actually an implementation of nbtree suffix truncation
> > that sometimes *adds* a new attribute in pivot tuples, rather than
> > truncating away non-distinguishing attributes -- it adds a special
> > heap TID attribute when it must. The patch typically increases fan-in,
> > of course, but the real goal was to make all entries at the leaf level
> > truly unique, as L intended (we cannot do it without suffix
> > truncation because that will kill fan-in). My prototype has full
> > amcheck coverage, which really helped me gain confidence in my
> > approach.
>
> Could you, please, clarify what do you mean by "fan-in"?  As I
> understood, higher "fan-in" means more children on single non-leaf
> page, and in turn "better branching".  Is my understanding correct?
>
> If my understanding is correct, then making leaf level truly unique
> without suffix truncation will kill fan-in, because additional heap
> TID attribute will increase pivot tuple size.  However, that doesn't
> look like inevitable shortcoming, because we could store heap TID in
> t_tid of pivot index tuples.  Yes, we've used t_tid offset for storing
> number of attributes in truncated tuples.  But heap TID is going to be
> virtually the last attribute of tuple.  So, pivot tuples containing
> heap TID are not going to be suffix-truncated anyway.  So, for such
> tuples we can just don't set INDEX_ALT_TID_MASK, and they would be
> assumed to have all the attributes.

I had a patch[1] that did pretty much just that. I saw no contention
issues or adverse effects, but my testing of it was rather limited.

The patch has rotted significantly since then, sadly, and it's quite complex.

> So, my idea that it's not necessary to couple suffix truncation with
> making leaf tuples unique.

No, but the work involved for one and the other shares so much that it
lends itself to be done in the same patch.

> Regarding just making leaf tuples unique,
> I understand that it wouldn't be always win.  When we have a lot of
> duplicates, current implementation searches among the pages containing
> those duplicates for the first page containing enough of free space.
> While with unique index, we would have to always insert into
> particular page.  Thus, current implementation gives us better filling
> of pages, and (probably) faster insertion.

Not at all. Insertion cost in unique indexes with lots of duplicates
(happens, dead duplicates) grows quadratically on the number of
duplicates, and that's improved by making the index unique and sorted.

> Therefore, my idea is that there is a tradeoff between making btree
> index unique or non-unique.  I think we will need an option for that.

We will need a flag somewhere to avoid having to require index
rebuilds during pg_upgrade. My old patch maintained backward
compatibility, but when using an older version index, the sort order
of tids could not be assumed, and thus many optimizations had to be
disabled.

But it is totally doable, and necessary unless you accept a very
painful pg_upgrade.


[1] 
https://www.postgresql.org/message-id/flat/CAGTBQpZ-kTRQiAa13xG1GNe461YOwrA-s-ycCQPtyFrpKTaDBQ%40mail.gmail.com#cagtbqpz-ktrqiaa13xg1gne461yowra-s-yccqptyfrpkta...@mail.gmail.com



Re: Locking B-tree leafs immediately in exclusive mode

2018-06-14 Thread Alexander Korotkov
On Thu, Jun 14, 2018 at 1:01 AM Peter Geoghegan  wrote:
> On Mon, Jun 11, 2018 at 9:30 AM, Alexander Korotkov
>  wrote:
> > On Mon, Jun 11, 2018 at 1:06 PM Simon Riggs  wrote:
> >> It's a good idea. How does it perform with many duplicate entries?
>
> I agree that this is a good idea. It independently occurred to me to
> do this. The L algorithm doesn't take a position on this at all,
> supposing that *no* locks are needed at all (that's why I recommend
> people skip L and just read the Lanin & Shasha paper -- L is
> unnecessarily confusing). This idea seems relatively low risk.

Great.  Unfortunately, we don't have access to 72-cores machine
anymore.  But I'll do benchmarks on smaller 40-cores machine, which we
have access to.  I hope there should be some win from this patch too.

> > Our B-tree is currently maintaining duplicates unordered.  So, during 
> > insertion
> > we can traverse rightlinks in order to find page, which would fit new
> > index tuple.
> > However, in this case we're traversing pages in exclusive lock mode, and
> > that happens already after re-lock.  _bt_search() doesn't do anything with 
> > that.
> > So, I assume there shouldn't be any degradation in the case of many
> > duplicate entries.
>
> BTW, I have a prototype patch that makes the keys at the leaf level
> unique. It is actually an implementation of nbtree suffix truncation
> that sometimes *adds* a new attribute in pivot tuples, rather than
> truncating away non-distinguishing attributes -- it adds a special
> heap TID attribute when it must. The patch typically increases fan-in,
> of course, but the real goal was to make all entries at the leaf level
> truly unique, as L intended (we cannot do it without suffix
> truncation because that will kill fan-in). My prototype has full
> amcheck coverage, which really helped me gain confidence in my
> approach.

Could you, please, clarify what do you mean by "fan-in"?  As I
understood, higher "fan-in" means more children on single non-leaf
page, and in turn "better branching".  Is my understanding correct?

If my understanding is correct, then making leaf level truly unique
without suffix truncation will kill fan-in, because additional heap
TID attribute will increase pivot tuple size.  However, that doesn't
look like inevitable shortcoming, because we could store heap TID in
t_tid of pivot index tuples.  Yes, we've used t_tid offset for storing
number of attributes in truncated tuples.  But heap TID is going to be
virtually the last attribute of tuple.  So, pivot tuples containing
heap TID are not going to be suffix-truncated anyway.  So, for such
tuples we can just don't set INDEX_ALT_TID_MASK, and they would be
assumed to have all the attributes.

So, my idea that it's not necessary to couple suffix truncation with
making leaf tuples unique.  Regarding just making leaf tuples unique,
I understand that it wouldn't be always win.  When we have a lot of
duplicates, current implementation searches among the pages containing
those duplicates for the first page containing enough of free space.
While with unique index, we would have to always insert into
particular page.  Thus, current implementation gives us better filling
of pages, and (probably) faster insertion.  But, unique index would
give us much better performance of retail tuple delete and update
(currently we don't support both).  But I believe that future
pluggable table access methods will use both.

Therefore, my idea is that there is a tradeoff between making btree
index unique or non-unique.  I think we will need an option for that.
I'm not yet sure if this option should be user-visible or
auto-decision based on table access method used.

> There are still big problems with my patch, though. It seems to hurt
> performance more often than it helps when there is a lot of
> contention, so as things stand I don't see a path to making it
> commitable. I am still going to clean it up some more and post it to
> -hackers, though. I still think it's quite interesting. I am not
> pursuing it much further now because it seems like my timing is bad --
> not because it seems like a bad idea. I can imagine something like
> zheap making this patch important again. I can also imagine someone
> seeing something that I missed.

I think that joint community efforts are necessary, when single person
don't see path to make patch committable.  So, I'm looking forward
seeing your patch posted.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Locking B-tree leafs immediately in exclusive mode

2018-06-13 Thread Peter Geoghegan
On Mon, Jun 11, 2018 at 9:30 AM, Alexander Korotkov
 wrote:
> On Mon, Jun 11, 2018 at 1:06 PM Simon Riggs  wrote:
>> It's a good idea. How does it perform with many duplicate entries?

I agree that this is a good idea. It independently occurred to me to
do this. The L algorithm doesn't take a position on this at all,
supposing that *no* locks are needed at all (that's why I recommend
people skip L and just read the Lanin & Shasha paper -- L is
unnecessarily confusing). This idea seems relatively low risk.

> Our B-tree is currently maintaining duplicates unordered.  So, during 
> insertion
> we can traverse rightlinks in order to find page, which would fit new
> index tuple.
> However, in this case we're traversing pages in exclusive lock mode, and
> that happens already after re-lock.  _bt_search() doesn't do anything with 
> that.
> So, I assume there shouldn't be any degradation in the case of many
> duplicate entries.

BTW, I have a prototype patch that makes the keys at the leaf level
unique. It is actually an implementation of nbtree suffix truncation
that sometimes *adds* a new attribute in pivot tuples, rather than
truncating away non-distinguishing attributes -- it adds a special
heap TID attribute when it must. The patch typically increases fan-in,
of course, but the real goal was to make all entries at the leaf level
truly unique, as L intended (we cannot do it without suffix
truncation because that will kill fan-in). My prototype has full
amcheck coverage, which really helped me gain confidence in my
approach.

There are still big problems with my patch, though. It seems to hurt
performance more often than it helps when there is a lot of
contention, so as things stand I don't see a path to making it
commitable. I am still going to clean it up some more and post it to
-hackers, though. I still think it's quite interesting. I am not
pursuing it much further now because it seems like my timing is bad --
not because it seems like a bad idea. I can imagine something like
zheap making this patch important again. I can also imagine someone
seeing something that I missed.

-- 
Peter Geoghegan



Re: Locking B-tree leafs immediately in exclusive mode

2018-06-11 Thread Simon Riggs
On 5 June 2018 at 14:45, Alexander Korotkov  wrote:

> Currently _bt_search() always locks leaf buffer in shared mode (aka BT_READ),
> while caller can relock it later.  However, I don't see what prevents
> _bt_search()
> from locking leaf immediately in exclusive mode (aka BT_WRITE) when required.
> When traversing downlink from non-leaf page of level 1 (lowest level of 
> non-leaf
> pages just prior to leaf pages), we know that next page is going to be leaf.  
> In
> this case, we can immediately lock next page in exclusive mode if required.
> For sure, it might happen that we didn't manage to exclusively lock leaf in 
> this
> way when _bt_getroot() points us to leaf page.  But this case could be handled
> in _bt_search() by relock.  Please, find implementation of this approach in 
> the
> attached patch.

It's a good idea. How does it perform with many duplicate entries?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services