On Dec 12, 2012, at 8:25 AM, Huan Ruan <leohuanr...@gmail.com> wrote:
> Hello All > > While investigating switching to Postgres, we come across a query plan that > uses hash join and is a lot slower than a nested loop join. > > I don't understand why the optimiser chooses the hash join in favor of the > nested loop. What can I do to get the optimiser to make a better decision > (nested loop in this case)? I have run analyze on both tables. > Optimiser thinks that nested loop is more expensive, because of point PK lookups, which a random io. Can you set random_page_cost to 2 or 3 and try again? > The query is, > /* > smalltable has about 48,000 records. > bigtable has about 168,000,000 records. > invtranref is char(10) and is the primary key for both tables > */ > SELECT > * > FROM IM_Match_Table smalltable > inner join invtran bigtable on > bigtable.invtranref = smalltable.invtranref > The hash join plan is, > "Hash Join (cost=1681.87..6414169.04 rows=48261 width=171)" > " Output: smalltable.invtranref, smalltable.itbatchref, smalltable.trantype, > smalltable.trandate, smalltable.invprodref, smalltable.invheadref, > bigtable.itbatchref, bigtable.invtranref, bigtable.invheadref, > bigtable.feeplanref, bigtable.invprodref, bigtable.trantype, > bigtable.trandate, bigtable.pricedate, bigtable.units, bigtable.tranamount, > bigtable.createmode, bigtable.transtat, bigtable.sysversion, > bigtable.sysuser, bigtable.rectype, bigtable.recstat, bigtable.seqnum, > bigtable.transign" > " Hash Cond: (bigtable.invtranref = smalltable.invtranref)" > " -> Seq Scan on public.invtran bigtable (cost=0.00..4730787.28 > rows=168121728 width=108)" > " Output: bigtable.itbatchref, bigtable.invtranref, > bigtable.invheadref, bigtable.feeplanref, bigtable.invprodref, > bigtable.trantype, bigtable.trandate, bigtable.pricedate, bigtable.units, > bigtable.tranamount, bigtable.createmode, bigtable.transtat, > bigtable.sysversion, bigtable.sysuser, bigtable.rectype, bigtable.recstat, > bigtable.seqnum, bigtable.transign" > " -> Hash (cost=1078.61..1078.61 rows=48261 width=63)" > " Output: smalltable.invtranref, smalltable.itbatchref, > smalltable.trantype, smalltable.trandate, smalltable.invprodref, > smalltable.invheadref" > " -> Seq Scan on public.im_match_table smalltable > (cost=0.00..1078.61 rows=48261 width=63)" > " Output: smalltable.invtranref, smalltable.itbatchref, > smalltable.trantype, smalltable.trandate, smalltable.invprodref, > smalltable.invheadref" > The nested loop join plan is, > "Nested Loop (cost=0.00..12888684.07 rows=48261 width=171)" > " Output: smalltable.invtranref, smalltable.itbatchref, smalltable.trantype, > smalltable.trandate, smalltable.invprodref, smalltable.invheadref, > bigtable.itbatchref, bigtable.invtranref, bigtable.invheadref, > bigtable.feeplanref, bigtable.invprodref, bigtable.trantype, > bigtable.trandate, bigtable.pricedate, bigtable.units, bigtable.tranamount, > bigtable.createmode, bigtable.transtat, bigtable.sysversion, > bigtable.sysuser, bigtable.rectype, bigtable.recstat, bigtable.seqnum, > bigtable.transign" > " -> Seq Scan on public.im_match_table smalltable (cost=0.00..1078.61 > rows=48261 width=63)" > " Output: smalltable.invtranref, smalltable.itbatchref, > smalltable.trantype, smalltable.trandate, smalltable.invprodref, > smalltable.invheadref" > " -> Index Scan using pk_invtran on public.invtran bigtable > (cost=0.00..267.03 rows=1 width=108)" > " Output: bigtable.itbatchref, bigtable.invtranref, > bigtable.invheadref, bigtable.feeplanref, bigtable.invprodref, > bigtable.trantype, bigtable.trandate, bigtable.pricedate, bigtable.units, > bigtable.tranamount, bigtable.createmode, bigtable.transtat, > bigtable.sysversion, bigtable.sysuser, bigtable.rectype, bigtable.recstat, > bigtable.seqnum, bigtable.transign" > " Index Cond: (bigtable.invtranref = smalltable.invtranref)" > The version is PostgreSQL 9.2.0 on x86_64-unknown-linux-gnu, compiled by gcc > (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit. Server specs are: > Centos, ext4 > 24GB memory > 6 cores hyper-threaded (Intel(R) Xeon(R) CPU E5645). > raid 10 on 4 sata disks > Config changes are > > shared_buffers = 6GB > effective_cache_size = 18GB > work_mem = 10MB > maintenance_work_mem = 3GB > Many Thanks > Huan > > > >