[PERFORM] Slow hash join performance with many batches

2015-06-01 Thread Alex Adriaanse
I have several databases that have the same schema but different amounts of 
data in it (let's categorize these as Small, Medium, and Large). We have a 
mammoth query with 13 CTEs that are LEFT JOINed against a main table. This 
query takes 30 mins on the Small database, 2 hours to run on Large, but on 
the Medium database it takes in the vicinity of 14 hours.

Running truss/strace on the backend process running this query on the Medium 
database reveals that for a big chunk of this time Postgres 
creates/reads/unlinks a very large quantity (millions?) of tiny files inside 
pgsql_tmp. I also ran an EXPLAIN ANALYZE and am attaching the most 
time-consuming parts of the plan (with names redacted). Although I'm not too 
familiar with the internals of Postgres' Hash implementation, it seems that 
having over 4 million hash batches could be what's causing the problem.

I'm running PostgreSQL 9.3.5, and have work_mem set to 32MB.

Is there any way I can work around this problem, other than to experiment with 
disabling enable_hashjoin for this query/database?

Alex
Hash Right Join  (cost=609908622207072.24..149222936608255392.00 
rows=6928136791749514240 width=1223) (actual time=33401772.349..50706732.014 
rows=406394 loops=1)
  Hash Cond: (cte_1.join_col = table_1.join_col)
  13 CTEs omitted
  -  CTE Scan on cte_1  (cost=0.00..16515608.32 rows=825780416 width=36) 
(actual time=292893.037..324100.993 rows=365136 loops=1)
  -  Hash  (cost=47862637793642.02..47862637793642.02 rows=3409566476502940 
width=1219) (actual time=33056746.437..33056746.437 rows=406394 loops=1)
Buckets: 4096  Batches: 4194304  Memory Usage: 2kB
-  Hash Right Join  (cost=298580771630.21..47862637793642.02 
rows=3409566476502940 width=1219) (actual time=5912925.770..33032636.805 
rows=406394 loops=1)
  Hash Cond: (cte_2.join_col = table_1.join_col)
  -  CTE Scan on cte_2  (cost=0.00..16515608.32 rows=825780416 
width=36) (actual time=280043.909..31.528 rows=365136 loops=1)
  -  Hash  (cost=23618016658.04..23618016658.04 rows=1677961031534 
width=1215) (actual time=5516337.065..5516337.065 rows=406394 loops=1)
Buckets: 4096  Batches: 4194304  Memory Usage: 2kB
-  Hash Right Join  (cost=178906627.34..23618016658.04 
rows=1677961031534 width=1215) (actual time=4067949.971..5495404.748 
rows=406394 loops=1)
  Hash Cond: (cte_3.join_col = table_1.join_col)
  -  CTE Scan on cte_3  (cost=0.00..16515608.32 
rows=825780416 width=36) (actual time=280040.022..313331.309 rows=365136 
loops=1)
  -  Hash  (cost=43588312.14..43588312.14 
rows=825780416 width=1211) (actual time=3784880.335..3784880.335 rows=406394 
loops=1)
Buckets: 4096  Batches: 32768  Memory Usage: 9kB
-  Hash Right Join  
(cost=689834.06..43588312.14 rows=825780416 width=1211) (actual 
time=3749003.819..3782275.100 rows=406394 loops=1)
  Hash Cond: (cte_4.join_col = 
table_1.join_col)
  -  CTE Scan on cte_4  
(cost=0.00..16515608.32 rows=825780416 width=36) (actual 
time=274018.453..306236.253 rows=365136 loops=1)
  -  Hash  (cost=623636.13..623636.13 
rows=406394 width=1207) (actual time=3474982.429..3474982.429 rows=406394 
loops=1)
Buckets: 4096  Batches: 16  Memory 
Usage: 6985kB
9 Merge Left Joins omitted

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hosted servers with good DB disk performance?

2009-05-26 Thread Alex Adriaanse

Greg Smith wrote:
What I'd love to have is a way to rent a fairly serious piece of 
dedicated hardware, ideally with multiple (at least 4) hard drives in 
a RAID configuration and a battery-backed write cache.  The cache is 
negotiable. Linux would be preferred, FreeBSD or Solaris would also 
work; not Windows though (see good DB performance).


Is anyone aware of a company that offers such a thing?
I've used http://softlayer.com/ in the past and highly recommend them.  
They sell a wide range of dedicated servers, including ones that handle 
up to 12 HDDs/SSDs, and servers with battery-backed RAID controllers 
(I've been told they use mostly Adaptec cards as well as some 3ware 
cards).  In addition, all their servers are connected to a private 
network you can VPN into, and all include IPMI for remote management 
when you can't SSH into your server.  They have a host of other 
features; click on the Services tab on their site to find out more.


Alex


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad row estimates

2006-03-08 Thread Alex Adriaanse
Thank you all for your valuable input.  I have tried creating a partial 
index, a GIST index, and a GIST + partial index, as suggested, but it 
does not seem to make a significant difference.  For instance:


CREATE INDEX test_table_1_interval_idx ON test_table_1 USING GIST
   (box(point(start_ts::abstime::integer, start_ts::abstime::integer), 
point(end_ts::abstime::integer, end_ts::abstime::integer)))
   WHERE id = g_id;

ANALYZE test_table_1;

EXPLAIN ANALYZE SELECT count(*) FROM test_table_1
   INNER JOIN test_table_2 ON (test_table_2.s_id=13300613 AND test_table_1.id = 
test_table_2.n_id)
   WHERE box(point(start_ts::abstime::integer, start_ts::abstime::integer), 
point(end_ts::abstime::integer, end_ts::abstime::integer))
   ~ 
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))
   AND test_table_1.id = test_table_1.g_id;
   QUERY PLAN  
---

Aggregate  (cost=15.09..15.10 rows=1 width=0) (actual time=69.771..69.772 
rows=1 loops=1)
  -  Nested Loop  (cost=9.06..15.08 rows=1 width=0) (actual 
time=69.752..69.752 rows=0 loops=1)
-  Index Scan using test_table_1_interval_idx on test_table_1  
(cost=0.07..4.07 rows=1 width=22) (actual time=2.930..3.607 rows=135 loops=1)
  Index Cond: (box(pointstart_ts)::abstime)::integer)::double 
precision, (((start_ts)::abstime)::integer)::double precision), 
pointend_ts)::abstime)::integer)::double precision, 
(((end_ts)::abstime)::integer)::double precision)) ~ 
box(pointnow())::abstime)::integer)::double precision, 
(((now())::abstime)::integer)::double precision), 
pointnow())::abstime)::integer)::double precision, 
(((now())::abstime)::integer)::double precision)))
-  Bitmap Heap Scan on test_table_2  (cost=8.99..11.00 rows=1 
width=12) (actual time=0.486..0.486 rows=0 loops=135)
  Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND 
(outer.id = test_table_2.n_id))
  -  BitmapAnd  (cost=8.99..8.99 rows=1 width=0) (actual 
time=0.485..0.485 rows=0 loops=135)
-  Bitmap Index Scan on test_table_2_s_id  
(cost=0.00..2.17 rows=48 width=0) (actual time=0.015..0.015 rows=1 loops=135)
  Index Cond: (s_id = 13300613::numeric)
-  Bitmap Index Scan on test_table_2_n_id  
(cost=0.00..6.57 rows=735 width=0) (actual time=0.467..0.467 rows=815 loops=135)
  Index Cond: (outer.id = test_table_2.n_id)
Total runtime: 69.961 ms

(Note: without the GIST index the query currently runs in about 65ms)

Its row estimates are still way off.  As a matter of fact, it almost 
seems as if the index doesn't affect row estimates at all.


What would you guys suggest?

Thanks,

Alex

Greg Stark wrote:

You could actually take short cuts using expression indexes to do this. If it
works out well then you might want to implement a real data type to avoid the
overhead of the SQL conversion functions.

Here's an example. If I were to do this for real I would look for a better
datatype than the box datatype and I would wrap the whole conversion in an SQL
function. But this will serve to demonstrate:

stark= create table interval_test (start_ts timestamp with time zone, end_ts 
timestamp with time zone);
CREATE TABLE

stark= create index interval_idx on interval_test using gist 
(box(point(start_ts::abstime::integer, end_ts::abstime::integer) , 
point(start_ts::abstime::integer, end_ts::abstime::integer)));
CREATE INDEX

stark= explain select * from interval_test where 
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))
 ~ box(point(start_ts::abstime::integer, end_ts::abstime::integer) , 
point(start_ts::abstime::integer, end_ts::abstime::integer));
  QUERY PLAN   

[PERFORM] Bad row estimates

2006-03-03 Thread Alex Adriaanse

Hello,

I am doing some query optimizations for one of my clients who runs 
PostgreSQL 8.1.1, and am trying to cut down on the runtime of this 
particular query as it runs very frequently:


SELECT count(*) FROM test_table_1
   INNER JOIN test_table_2 ON
   (test_table_2.s_id = 13300613 AND test_table_1.id = test_table_2.n_id)
   WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts
   AND test_table_1.id = test_table_1.g_id;


The related tables are as follows:

  Table public.test_table_1
 Column  |   Type   | Modifiers
--+--+---
id   | numeric(20,0)| not null
g_id | numeric(20,0)|
start_ts | timestamp with time zone |
end_ts   | timestamp with time zone |
Indexes:
   test_table_1_pkey PRIMARY KEY, btree (id)
   test_table_1_ts_index btree (start_ts, end_ts)

   Table public.test_table_2
Column | Type  | Modifiers
+---+---
s_id   | numeric(20,0) |
n_id   | numeric(20,0) |
Indexes:
   test_table_2_n_id btree (n_id)
   test_table_2_s_id btree (s_id)


When I run the query it uses the following plan:

Aggregate  (cost=217.17..217.18 rows=1 width=0) (actual time=107.829..107.830 
rows=1 loops=1)
  -  Nested Loop  (cost=11.09..217.16 rows=1 width=0) (actual 
time=107.817..107.817 rows=0 loops=1)
-  Index Scan using test_table_1_ts_index on test_table_1  
(cost=0.01..204.05 rows=1 width=22) (actual time=3.677..4.388 rows=155 loops=1)
  Index Cond: ((now() = start_ts) AND (now() = end_ts))
  Filter: (id = g_id)
-  Bitmap Heap Scan on test_table_2  (cost=11.09..13.10 rows=1 
width=12) (actual time=0.664..0.664 rows=0 loops=155)
  Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND 
(outer.id = test_table_2.n_id))
  -  BitmapAnd  (cost=11.09..11.09 rows=1 width=0) (actual 
time=0.662..0.662 rows=0 loops=155)
-  Bitmap Index Scan on test_table_2_s_id  
(cost=0.00..2.48 rows=136 width=0) (actual time=0.014..0.014 rows=1 loops=155)
  Index Cond: (s_id = 13300613::numeric)
-  Bitmap Index Scan on test_table_2_n_id  
(cost=0.00..8.36 rows=959 width=0) (actual time=0.645..0.645 rows=891 loops=155)
  Index Cond: (outer.id = test_table_2.n_id)
Total runtime: 107.947 ms


However, when I turn off enable_nestloop it runs as follows:

Aggregate  (cost=465.86..465.87 rows=1 width=0) (actual time=5.763..5.764 
rows=1 loops=1)
  -  Merge Join  (cost=465.16..465.86 rows=1 width=0) (actual 
time=5.752..5.752 rows=0 loops=1)
Merge Cond: (outer.id = inner.n_id)
-  Sort  (cost=204.06..204.07 rows=1 width=22) (actual 
time=5.505..5.505 rows=1 loops=1)
  Sort Key: test_table_1.id
  -  Index Scan using test_table_1_ts_index on test_table_1  
(cost=0.01..204.05 rows=1 width=22) (actual time=4.458..4.995 rows=155 loops=1)
Index Cond: ((now() = start_ts) AND (now() = end_ts))
Filter: (id = g_id)
-  Sort  (cost=261.10..261.44 rows=136 width=12) (actual 
time=0.235..0.236 rows=1 loops=1)
  Sort Key: test_table_2.n_id
  -  Bitmap Heap Scan on test_table_2  (cost=2.48..256.28 rows=136 
width=12) (actual time=0.218..0.219 rows=1 loops=1)
Recheck Cond: (s_id = 13300613::numeric)
-  Bitmap Index Scan on test_table_2_s_id  
(cost=0.00..2.48 rows=136 width=0) (actual time=0.168..0.168 rows=1 loops=1)
  Index Cond: (s_id = 13300613::numeric)
Total runtime: 5.893 ms

As you can see the total runtime drops from 108ms to 6ms, indicating 
that it is much better to use a Merge Join rather than a Nested Loop in 
this case.  It looks like the planner chooses a Nested Loop because it 
incorrectly estimates the (now() BETWEEN test_table_1.start_ts AND 
test_table_1.end_ts AND test_table_1.id = test_table_1.g_id) condition 
to return 1 row, whereas in reality it returns 155 rows.


I have set statistics for test_table_1.id and test_table_1.g_id to 1000, 
and have ANALYZEd both tables.  This does not seem to make a bit of a 
difference -- it keeps thinking the criteria will only return 1 row.  
However, if I add a boolean column named equal_ids to test_table_1 
with the value (test_table_1.id = test_table_1.g_id), and use that in 
the query instead of the equality it does make a much better row 
estimate.  Essentially:


ALTER TABLE test_table_1 ADD equal_ids BOOLEAN;
UPDATE test_table_1 SET equal_ids = (id = g_id);
VACUUM FULL test_table_1;
ANALYZE VERBOSE test_table_1;
INFO:  analyzing public.test_table_1
INFO:  test_table_1: scanned 83 of 83 pages, containing 8827 live 
rows and 0 dead rows; 8827 rows in sample, 8827 estimated total rows


The plans listed above already reflect these changes.  When I substitute 
test_table_1.id = test_table_1.g_id with