[PERFORM] Slow hash join performance with many batches
I have several databases that have the same schema but different amounts of data in it (let's categorize these as Small, Medium, and Large). We have a mammoth query with 13 CTEs that are LEFT JOINed against a main table. This query takes 30 mins on the Small database, 2 hours to run on Large, but on the Medium database it takes in the vicinity of 14 hours. Running truss/strace on the backend process running this query on the Medium database reveals that for a big chunk of this time Postgres creates/reads/unlinks a very large quantity (millions?) of tiny files inside pgsql_tmp. I also ran an EXPLAIN ANALYZE and am attaching the most time-consuming parts of the plan (with names redacted). Although I'm not too familiar with the internals of Postgres' Hash implementation, it seems that having over 4 million hash batches could be what's causing the problem. I'm running PostgreSQL 9.3.5, and have work_mem set to 32MB. Is there any way I can work around this problem, other than to experiment with disabling enable_hashjoin for this query/database? Alex Hash Right Join (cost=609908622207072.24..149222936608255392.00 rows=6928136791749514240 width=1223) (actual time=33401772.349..50706732.014 rows=406394 loops=1) Hash Cond: (cte_1.join_col = table_1.join_col) 13 CTEs omitted - CTE Scan on cte_1 (cost=0.00..16515608.32 rows=825780416 width=36) (actual time=292893.037..324100.993 rows=365136 loops=1) - Hash (cost=47862637793642.02..47862637793642.02 rows=3409566476502940 width=1219) (actual time=33056746.437..33056746.437 rows=406394 loops=1) Buckets: 4096 Batches: 4194304 Memory Usage: 2kB - Hash Right Join (cost=298580771630.21..47862637793642.02 rows=3409566476502940 width=1219) (actual time=5912925.770..33032636.805 rows=406394 loops=1) Hash Cond: (cte_2.join_col = table_1.join_col) - CTE Scan on cte_2 (cost=0.00..16515608.32 rows=825780416 width=36) (actual time=280043.909..31.528 rows=365136 loops=1) - Hash (cost=23618016658.04..23618016658.04 rows=1677961031534 width=1215) (actual time=5516337.065..5516337.065 rows=406394 loops=1) Buckets: 4096 Batches: 4194304 Memory Usage: 2kB - Hash Right Join (cost=178906627.34..23618016658.04 rows=1677961031534 width=1215) (actual time=4067949.971..5495404.748 rows=406394 loops=1) Hash Cond: (cte_3.join_col = table_1.join_col) - CTE Scan on cte_3 (cost=0.00..16515608.32 rows=825780416 width=36) (actual time=280040.022..313331.309 rows=365136 loops=1) - Hash (cost=43588312.14..43588312.14 rows=825780416 width=1211) (actual time=3784880.335..3784880.335 rows=406394 loops=1) Buckets: 4096 Batches: 32768 Memory Usage: 9kB - Hash Right Join (cost=689834.06..43588312.14 rows=825780416 width=1211) (actual time=3749003.819..3782275.100 rows=406394 loops=1) Hash Cond: (cte_4.join_col = table_1.join_col) - CTE Scan on cte_4 (cost=0.00..16515608.32 rows=825780416 width=36) (actual time=274018.453..306236.253 rows=365136 loops=1) - Hash (cost=623636.13..623636.13 rows=406394 width=1207) (actual time=3474982.429..3474982.429 rows=406394 loops=1) Buckets: 4096 Batches: 16 Memory Usage: 6985kB 9 Merge Left Joins omitted -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hosted servers with good DB disk performance?
Greg Smith wrote: What I'd love to have is a way to rent a fairly serious piece of dedicated hardware, ideally with multiple (at least 4) hard drives in a RAID configuration and a battery-backed write cache. The cache is negotiable. Linux would be preferred, FreeBSD or Solaris would also work; not Windows though (see good DB performance). Is anyone aware of a company that offers such a thing? I've used http://softlayer.com/ in the past and highly recommend them. They sell a wide range of dedicated servers, including ones that handle up to 12 HDDs/SSDs, and servers with battery-backed RAID controllers (I've been told they use mostly Adaptec cards as well as some 3ware cards). In addition, all their servers are connected to a private network you can VPN into, and all include IPMI for remote management when you can't SSH into your server. They have a host of other features; click on the Services tab on their site to find out more. Alex -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad row estimates
Thank you all for your valuable input. I have tried creating a partial index, a GIST index, and a GIST + partial index, as suggested, but it does not seem to make a significant difference. For instance: CREATE INDEX test_table_1_interval_idx ON test_table_1 USING GIST (box(point(start_ts::abstime::integer, start_ts::abstime::integer), point(end_ts::abstime::integer, end_ts::abstime::integer))) WHERE id = g_id; ANALYZE test_table_1; EXPLAIN ANALYZE SELECT count(*) FROM test_table_1 INNER JOIN test_table_2 ON (test_table_2.s_id=13300613 AND test_table_1.id = test_table_2.n_id) WHERE box(point(start_ts::abstime::integer, start_ts::abstime::integer), point(end_ts::abstime::integer, end_ts::abstime::integer)) ~ box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer)) AND test_table_1.id = test_table_1.g_id; QUERY PLAN --- Aggregate (cost=15.09..15.10 rows=1 width=0) (actual time=69.771..69.772 rows=1 loops=1) - Nested Loop (cost=9.06..15.08 rows=1 width=0) (actual time=69.752..69.752 rows=0 loops=1) - Index Scan using test_table_1_interval_idx on test_table_1 (cost=0.07..4.07 rows=1 width=22) (actual time=2.930..3.607 rows=135 loops=1) Index Cond: (box(pointstart_ts)::abstime)::integer)::double precision, (((start_ts)::abstime)::integer)::double precision), pointend_ts)::abstime)::integer)::double precision, (((end_ts)::abstime)::integer)::double precision)) ~ box(pointnow())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double precision), pointnow())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double precision))) - Bitmap Heap Scan on test_table_2 (cost=8.99..11.00 rows=1 width=12) (actual time=0.486..0.486 rows=0 loops=135) Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND (outer.id = test_table_2.n_id)) - BitmapAnd (cost=8.99..8.99 rows=1 width=0) (actual time=0.485..0.485 rows=0 loops=135) - Bitmap Index Scan on test_table_2_s_id (cost=0.00..2.17 rows=48 width=0) (actual time=0.015..0.015 rows=1 loops=135) Index Cond: (s_id = 13300613::numeric) - Bitmap Index Scan on test_table_2_n_id (cost=0.00..6.57 rows=735 width=0) (actual time=0.467..0.467 rows=815 loops=135) Index Cond: (outer.id = test_table_2.n_id) Total runtime: 69.961 ms (Note: without the GIST index the query currently runs in about 65ms) Its row estimates are still way off. As a matter of fact, it almost seems as if the index doesn't affect row estimates at all. What would you guys suggest? Thanks, Alex Greg Stark wrote: You could actually take short cuts using expression indexes to do this. If it works out well then you might want to implement a real data type to avoid the overhead of the SQL conversion functions. Here's an example. If I were to do this for real I would look for a better datatype than the box datatype and I would wrap the whole conversion in an SQL function. But this will serve to demonstrate: stark= create table interval_test (start_ts timestamp with time zone, end_ts timestamp with time zone); CREATE TABLE stark= create index interval_idx on interval_test using gist (box(point(start_ts::abstime::integer, end_ts::abstime::integer) , point(start_ts::abstime::integer, end_ts::abstime::integer))); CREATE INDEX stark= explain select * from interval_test where box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer)) ~ box(point(start_ts::abstime::integer, end_ts::abstime::integer) , point(start_ts::abstime::integer, end_ts::abstime::integer)); QUERY PLAN
[PERFORM] Bad row estimates
Hello, I am doing some query optimizations for one of my clients who runs PostgreSQL 8.1.1, and am trying to cut down on the runtime of this particular query as it runs very frequently: SELECT count(*) FROM test_table_1 INNER JOIN test_table_2 ON (test_table_2.s_id = 13300613 AND test_table_1.id = test_table_2.n_id) WHERE now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts AND test_table_1.id = test_table_1.g_id; The related tables are as follows: Table public.test_table_1 Column | Type | Modifiers --+--+--- id | numeric(20,0)| not null g_id | numeric(20,0)| start_ts | timestamp with time zone | end_ts | timestamp with time zone | Indexes: test_table_1_pkey PRIMARY KEY, btree (id) test_table_1_ts_index btree (start_ts, end_ts) Table public.test_table_2 Column | Type | Modifiers +---+--- s_id | numeric(20,0) | n_id | numeric(20,0) | Indexes: test_table_2_n_id btree (n_id) test_table_2_s_id btree (s_id) When I run the query it uses the following plan: Aggregate (cost=217.17..217.18 rows=1 width=0) (actual time=107.829..107.830 rows=1 loops=1) - Nested Loop (cost=11.09..217.16 rows=1 width=0) (actual time=107.817..107.817 rows=0 loops=1) - Index Scan using test_table_1_ts_index on test_table_1 (cost=0.01..204.05 rows=1 width=22) (actual time=3.677..4.388 rows=155 loops=1) Index Cond: ((now() = start_ts) AND (now() = end_ts)) Filter: (id = g_id) - Bitmap Heap Scan on test_table_2 (cost=11.09..13.10 rows=1 width=12) (actual time=0.664..0.664 rows=0 loops=155) Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND (outer.id = test_table_2.n_id)) - BitmapAnd (cost=11.09..11.09 rows=1 width=0) (actual time=0.662..0.662 rows=0 loops=155) - Bitmap Index Scan on test_table_2_s_id (cost=0.00..2.48 rows=136 width=0) (actual time=0.014..0.014 rows=1 loops=155) Index Cond: (s_id = 13300613::numeric) - Bitmap Index Scan on test_table_2_n_id (cost=0.00..8.36 rows=959 width=0) (actual time=0.645..0.645 rows=891 loops=155) Index Cond: (outer.id = test_table_2.n_id) Total runtime: 107.947 ms However, when I turn off enable_nestloop it runs as follows: Aggregate (cost=465.86..465.87 rows=1 width=0) (actual time=5.763..5.764 rows=1 loops=1) - Merge Join (cost=465.16..465.86 rows=1 width=0) (actual time=5.752..5.752 rows=0 loops=1) Merge Cond: (outer.id = inner.n_id) - Sort (cost=204.06..204.07 rows=1 width=22) (actual time=5.505..5.505 rows=1 loops=1) Sort Key: test_table_1.id - Index Scan using test_table_1_ts_index on test_table_1 (cost=0.01..204.05 rows=1 width=22) (actual time=4.458..4.995 rows=155 loops=1) Index Cond: ((now() = start_ts) AND (now() = end_ts)) Filter: (id = g_id) - Sort (cost=261.10..261.44 rows=136 width=12) (actual time=0.235..0.236 rows=1 loops=1) Sort Key: test_table_2.n_id - Bitmap Heap Scan on test_table_2 (cost=2.48..256.28 rows=136 width=12) (actual time=0.218..0.219 rows=1 loops=1) Recheck Cond: (s_id = 13300613::numeric) - Bitmap Index Scan on test_table_2_s_id (cost=0.00..2.48 rows=136 width=0) (actual time=0.168..0.168 rows=1 loops=1) Index Cond: (s_id = 13300613::numeric) Total runtime: 5.893 ms As you can see the total runtime drops from 108ms to 6ms, indicating that it is much better to use a Merge Join rather than a Nested Loop in this case. It looks like the planner chooses a Nested Loop because it incorrectly estimates the (now() BETWEEN test_table_1.start_ts AND test_table_1.end_ts AND test_table_1.id = test_table_1.g_id) condition to return 1 row, whereas in reality it returns 155 rows. I have set statistics for test_table_1.id and test_table_1.g_id to 1000, and have ANALYZEd both tables. This does not seem to make a bit of a difference -- it keeps thinking the criteria will only return 1 row. However, if I add a boolean column named equal_ids to test_table_1 with the value (test_table_1.id = test_table_1.g_id), and use that in the query instead of the equality it does make a much better row estimate. Essentially: ALTER TABLE test_table_1 ADD equal_ids BOOLEAN; UPDATE test_table_1 SET equal_ids = (id = g_id); VACUUM FULL test_table_1; ANALYZE VERBOSE test_table_1; INFO: analyzing public.test_table_1 INFO: test_table_1: scanned 83 of 83 pages, containing 8827 live rows and 0 dead rows; 8827 rows in sample, 8827 estimated total rows The plans listed above already reflect these changes. When I substitute test_table_1.id = test_table_1.g_id with