Hi, I have similar problem as in http://www.postgresql.org/message-id/flat/52b311c4.1070...@gmail.com#52b311c4.1070...@gmail.com
server version is 9.3.4 Here is only two quite simple tables: db_new=# \d activities_example Table "public.activities_example" Column | Type | Modifiers ----------------+---------+----------- id | integer | order_chain_id | integer | Indexes: "activities_example_idx" btree (order_chain_id) db_new=# \d orders_example Table "public.orders_example" Column | Type | Modifiers --------+---------+----------- id | integer | Number of rows as below: db_new=# select count(*) from activities_example ; count --------- 3059965 db_new=# select count(*) from orders_example ; count ------- 19038 db_new=# select count(*) from activities_example where order_chain_id in (select id from orders_example); count ------- 91426 (1 row) and I can see that planner uses hashjoin with all enabled options and nested loop with disabled parameter: db_new=# explain analyze select * from activities_example where order_chain_id in (select id from orders_example); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Semi Join (cost=513.36..57547.59 rows=89551 width=8) (actual time=18.340..966.367 rows=91426 loops=1) Hash Cond: (activities_example.order_chain_id = orders_example.id) -> Seq Scan on activities_example (cost=0.00..44139.65 rows=3059965 width=8) (actual time=0.018..294.216 rows=3059965 loops=1) -> Hash (cost=275.38..275.38 rows=19038 width=4) (actual time=5.458..5.458 rows=19038 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 670kB -> Seq Scan on orders_example (cost=0.00..275.38 rows=19038 width=4) (actual time=0.015..2.308 rows=19038 loops=1) Total runtime: 970.234 ms (7 rows) db_new=# set enable_hashjoin = off; SET db_new=# explain analyze select * from activities_example where order_chain_id in (select id from orders_example); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1629.09..166451.01 rows=89551 width=8) (actual time=16.091..116.476 rows=91426 loops=1) -> Unique (cost=1628.66..1723.85 rows=19038 width=4) (actual time=15.929..23.156 rows=19038 loops=1) -> Sort (cost=1628.66..1676.25 rows=19038 width=4) (actual time=15.892..19.884 rows=19038 loops=1) Sort Key: orders_example.id Sort Method: external sort Disk: 264kB -> Seq Scan on orders_example (cost=0.00..275.38 rows=19038 width=4) (actual time=0.015..2.747 rows=19038 loops=1) -> Index Scan using activities_example_idx on activities_example (cost=0.43..8.60 rows=5 width=8) (actual time=0.002..0.004 rows=5 loops=19038) Index Cond: (order_chain_id = orders_example.id) Total runtime: 121.366 ms (9 rows) second runtime is much more quicker. What is the reason of "Seq Scan on activities_example" in the first case? Is it possible to force optimizer choose the second plan without doing "set enable_hashjoin = off;" ? Increasing of 'effective_cache_size' leads to similar thing with mergejoin, other options (work_mem, shared_buffers. etc) do not change anything. Thanks in advance. -- Regards, Andrey Lizenko