Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-24 Thread Bartłomiej Romański
 As a matter of fact you get the same slow down after a rollback until
autovacuum, and if autovacuum can't keep up...

Actually, this is not what we observe. The performance goes back to the
normal level immediately after committing or aborting the transaction.



On Wed, Sep 25, 2013 at 1:30 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 On Tue, Sep 24, 2013 at 11:03 AM, didier did...@gmail.com wrote:

 Hi


 On Tue, Sep 24, 2013 at 5:01 PM, jes...@krogh.cc wrote:


 Apparently it is waiting for locks, cant the check be make in a
 non-blocking way, so if it ends up waiting for a lock then it just
 assumes non-visible and moves onto the next non-blocking?


 Not only, it's a reason but you can get the same slow down with only  one
 client and a bigger insert.

 The issue is that a btree search for one value  degenerate to a linear
 search other  1000 or more tuples.

 As a matter of fact you get the same slow down after a rollback until
 autovacuum, and if autovacuum can't keep up...


 Have you experimentally verified the last part?  btree indices have some
 special kill-tuple code which should remove aborted tuples from the index
 the first time they are encountered, without need for a vacuum.

 Cheers,

 Jeff



[PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-19 Thread Bartłomiej Romański
Hi all

We're experiencing a very strange performance issue. Our setup is a bit
more complicated, but we've managed to isolate and replicate the core
problem. Here's what we observe:

We took a strong machine (128 GB RAM, 8-core CPU, SSD drives...) and
installed a fresh copy of PostgreSQL 9.2 (Ubuntu Server 12.04 LTS, default
configuration).

Then, we created a test database with the following schema:

CREATE TABLE offers
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  category_id bigint NOT NULL,
  CONSTRAINT offers_pkey PRIMARY KEY (id)
);

CREATE TABLE categories
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  CONSTRAINT categories_pkey PRIMARY KEY (id)
);

and populated it with in the following way:

insert into categories (name) select 'category_' || x from
generate_series(1,1000) as x;
insert into offers (name, category_id) select 'offer_' || x, floor(random()
* 1000) + 1 from generate_series(1,1000*1000) as x;

Finally, we created a python script to make simple queries in a loop:

while True:
id = random.randrange(1, 1000 * 1000)
db.execute('select offers.id, offers.name, categories.id,
categories.name from offers left join categories on categories.id =
offers.category_id where offers.id = %s', (id,))
print db.fetchall()

We start 20 instances simultaneously and measure performance:

parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l  /dev/null

Normally we observe about 30k QPS what's a satisfying result (without any
tuning at all).

The problem occurs when we open a second console, start psql and type:

pgtest= begin; insert into categories (name) select 'category_' || x from
generate_series(1,1000) as x;

We start a transaction and insert 1k records to the 'categories' table.
After that performance of the benchmark described above immediately drops
to about 1-2k QPS. That's 20-30 times! After closing the transaction
(committing or aborting - doesn't matter) it immediately jumps back to 30k
QPS.

Restarting the running script and other simple tricks do not change
anything. The hanging, open transaction is causing a huge slowdown. What's
more when doing similar insert (even much larger) to the 'offers' table we
do not observe this effect.

We analyzed the problem a bit deeper and find out that the actual query
execution times are not affected that much. They are constantly close to
0.5 ms. This can be observed in a server log (after enabling appropriate
option) and this can be found in 'explain analyze...' result. Also the
query plan returned do not change and looks optimal (pkey scan for 'offers'
+ pkey scan for 'categories').

After a few random thought we've finally turned the 'log_planner_stats'
option and found out that the planner executions times are highly affected
by the hanging transaction. Here's the typical output in the initial
situation:

2013-09-17 21:54:59 UTC LOG:  PLANNER STATISTICS
2013-09-17 21:54:59 UTC DETAIL:  ! system usage stats:
!   0.000137 elapsed 0.00 user 0.00 system sec
!   [2.169670 user 0.383941 sys total]
!   0/0 [0/11520] filesystem blocks in/out
!   0/0 [0/7408] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   0/0 [1362/7648] voluntary/involuntary context switches

And here's a typical output with a hanging transaction:

2013-09-17 21:56:12 UTC LOG:  PLANNER STATISTICS
2013-09-17 21:56:12 UTC DETAIL:  ! system usage stats:
!   0.027251 elapsed 0.008999 user 0.001000 system sec
!   [32.722025 user 3.550460 sys total]
!   0/0 [0/115128] filesystem blocks in/out
!   0/0 [0/7482] page faults/reclaims, 0 [0] swaps
!   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!   1/6 [12817/80202] voluntary/involuntary context switches

As you can see it takes over 100 times more time when the extra transaction
is open!

Any ideas why's that? It extremely affects total query time.

I know that using prepared statements to solves the problem completely, so
we can deal with it, but we've already spend a lot of time to go that far
and I'm personally a bit curious what's the fundamental reason for such a
behavior.

I'll be very thankful for any explanation what's going on here!

Thanks,
BR


Re: [PERFORM] Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-19 Thread Bartłomiej Romański
Hi Julian,

Here's my complete python script:

import psycopg2
import random
import math
import time

connection = psycopg2.connect('host=localhost dbname=pgtest user=pgtest
password=pgtest')
cursor = connection.cursor()

while True:
id = random.randrange(1, 1000 * 1000)
cursor.execute('select offers.id, offers.name, categories.id,
categories.name from offers left join categories on categories.id =
offers.category_id where offers.id = %s', (id,))
print cursor.fetchall()

So I assume that each of 20 instances opens and uses a single transaction.
I've just tested the options with connection.autocommit = True at the
begging, but it does not change anything. Also in production (where we
first noticed the problem) we use a new transaction for every select.

I start 20 instances of this python script (I use pv to measure
performance):

parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l  /dev/null

And then in a second console, I open psql and type:

pgtest= begin; insert into categories (name) select 'category_' || x from
generate_series(1,1000) as x;

The QPS displayed be the first command drops immediately 20-30 times and
stays low as long as the transaction with insert is open.

Here's the script that I use to initiate the database. You should be able
to replicate the situation easily yourself if you want.

drop database pgtest;
drop user pgtest;
create user pgtest with password 'pgtest';
create database pgtest;
grant all privileges on database pgtest to pgtest;
\connect pgtest
CREATE TABLE categories
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  CONSTRAINT categories_pkey PRIMARY KEY (id)
);
CREATE TABLE offers
(
  id bigserial NOT NULL,
  name character varying NOT NULL,
  category_id bigint NOT NULL,
  CONSTRAINT offers_pkey PRIMARY KEY (id)
);
insert into categories (name) select 'category_' || x from
generate_series(1,1000) as x;
insert into offers (name, category_id) select 'offer_' || x, floor(random()
* 1000) + 1 from generate_series(1,1000*1000) as x;
alter table categories owner to pgtest;
alter table offers owner to pgtest;

Thanks,
Bartek




On Fri, Sep 20, 2013 at 3:42 AM, Julian temp...@internode.on.net wrote:

 Hi Bart,

 You are doing heavy random reads in addition to a 1000k row insert
 within a single transaction.

 Also it is not clear whether your query within the python loop is itself
 within a transaction. (i.e. pyscopg2.connection.autocommit to True,
 disables transactional queries).

 Depending on your pg adapter, it may open a transaction by default even
 though it may not be required.

 So please clarify whether this is the case.

 Regards,

 Julian.

 On 20/09/13 10:49, Bartłomiej Romański wrote:
  Hi all
 
  We're experiencing a very strange performance issue. Our setup is a
  bit more complicated, but we've managed to isolate and replicate the
  core problem. Here's what we observe:
 
  We took a strong machine (128 GB RAM, 8-core CPU, SSD drives...) and
  installed a fresh copy of PostgreSQL 9.2 (Ubuntu Server 12.04 LTS,
  default configuration).
 
  Then, we created a test database with the following schema:
 
  CREATE TABLE offers
  (
  id bigserial NOT NULL,
  name character varying NOT NULL,
  category_id bigint NOT NULL,
  CONSTRAINT offers_pkey PRIMARY KEY (id)
  );
 
  CREATE TABLE categories
  (
  id bigserial NOT NULL,
  name character varying NOT NULL,
  CONSTRAINT categories_pkey PRIMARY KEY (id)
  );
 
  and populated it with in the following way:
 
  insert into categories (name) select 'category_' || x from
  generate_series(1,1000) as x;
  insert into offers (name, category_id) select 'offer_' || x,
  floor(random() * 1000) + 1 from generate_series(1,1000*1000) as x;
 
  Finally, we created a python script to make simple queries in a loop:
 
  while True:
  id = random.randrange(1, 1000 * 1000)
  db.execute('select offers.id http://offers.id/, offers.name
  http://offers.name/, categories.id http://categories.id/,
  categories.name http://categories.name/ from offers left join
  categories on categories.id http://categories.id/ =
  offers.category_id where offers.id http://offers.id/ = %s', (id,))
  print db.fetchall()
 
  We start 20 instances simultaneously and measure performance:
 
  parallel -j 20 ./test.py -- $(seq 1 20) | pv -i1 -l  /dev/null
 
  Normally we observe about 30k QPS what's a satisfying result (without
  any tuning at all).
 
  The problem occurs when we open a second console, start psql and type:
 
  pgtest= begin; insert into categories (name) select 'category_' || x
  from generate_series(1,1000) as x;
 
  We start a transaction and insert 1k records to the 'categories'
  table. After that performance of the benchmark described above
  immediately drops to about 1-2k QPS. That's 20-30 times! After closing
  the transaction (committing or aborting - doesn't matter) it
  immediately jumps back to 30k QPS.
 
  Restarting the running script and other simple tricks do not change
  anything. The hanging, open

[PERFORM] Re: Planner performance extremely affected by an hanging transaction (20-30 times)?

2013-09-17 Thread Bartłomiej Romański
 What's more doing similar insert (even much larger) to the 'offers' table
does not affect the benchmark results in any significant way...

Just want clarify myself here: Insert to 'offers' table does not cause the
slowdown. Only insert to 'categories' causes the problem.