[PERFORM] two table join just not fast enough.

2011-11-02 Thread Brian Fehrle

Hi all,

I've got a query that I need to squeeze as much speed out of as I can.

When I execute this query, the average time it takes is about 190 ms. I 
increased my work_mem from 1 MB to 50MB and it decreased the timing down 
to an average of 170 ms, but that's still not fast enough. This query is 
executed extremely frequently, so much of it should be easily cached.


Some settings
work_mem = 50MB
shared_buffers = 5GB

I've made sure that the two tables are recently analyzed (with 
default_stats to 100, 400, and 1500 even), and table bloat is low (150 
meg table has 7 megs bloat).


Here's the query:
SELECT yankee.my_id
FROM yankee
INNER JOIN hotel_zulu
ON hotel_zulu.my_id = yankee.zulu_id
AND hotel_zulu.type IN ('string1', 'string2', 'string3', 'string4')
WHERE yankee.your_id=402513;

And here is a query plan.

Hash Join  (cost=17516.470..26386.660 rows=27624 width=4) (actual 
time=309.194..395.135 rows=12384 loops=1)
Hash Cond: (yankee.alpha = hotel_zulu.quebec)
  -   Bitmap Heap Scan on yankee  (cost=1066.470..8605.770 rows=27624 
width=20) (actual time=5.178..34.693 rows=26963 loops=1)
  Recheck Cond: (mike = 402513)
-   Bitmap Index Scan on hotel_alpha  (cost=0.000..1059.570 rows=27624 
width=0) (actual time=4.770..4.770 rows=26967 loops=1)
Index Cond: (mike = 402513)
  -   Hash  (cost=14465.080..14465.080 rows=114154 width=16) (actual 
time=303.717..303.717 rows=129908 loops=1)
  Buckets: 4096  Batches: 8  Memory Usage: 784kB
-   Bitmap Heap Scan on hotel_zulu  (cost=2461.770..14465.080 
rows=114154 width=16) (actual time=25.642..185.253 rows=129908 loops=1)
Recheck Cond: ((two)::golf = ANY ('xray'::golf[]))
  -   Bitmap Index Scan on kilo  (cost=0.000..2433.230 rows=114154 
width=0) (actual time=23.887..23.887 rows=130292 loops=1)
  Index Cond: ((two)::golf = ANY ('xray'::golf[]))



One thing I notice is the rows estimated is 27624 and the actual rows 
returned is 12384. Quite a bit different.


Table 2 (known here as hotel_zulu) is being joined on zulu_id to the 
first table, and then a where clause on the column 'type'. There are 
single column indexes on each of these columns, and any multi column 
index I put on these are just ignored by the planner.


Any thoughts on ways to tweak this?

- Brian F

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two table join just not fast enough.

2011-11-02 Thread Tom Lane
Brian Fehrle bri...@consistentstate.com writes:
 I've got a query that I need to squeeze as much speed out of as I can.

Hmm ... are you really sure this is being run with work_mem = 50MB?
The hash join is getting batched, which means the executor thinks it's
working under a memory constraint significantly less than the size of
the filtered inner relation, which should be no more than a couple
megabytes according to this.

I'm not sure how much that will save, since the hashjoin seems to be
reasonably speedy anyway, but there's not much other fat to trim here.

One minor suggestion is to think whether you really need string
comparisons here or could convert that to use of an enum type.
String compares ain't cheap, especially not in non-C locales.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two table join just not fast enough.

2011-11-02 Thread Brian Fehrle

Thanks Tom,
And looks like I pasted an older explain plan, which is almost exactly 
the same as the one with 50MB work_mem, except for the hash join 
'buckets' part which used more memory and only one 'bucket' so to speak.


When running with the 50MB work_mem over 1MB work_mem, the query went 
from an average of 190 ms to 169 ms, so it did help some but it wasn't a 
game changer (I even found for this specific query, 6MB of work_mem was 
the most that would actually help me).


I have other plans to try to get this thing running faster, I'll be 
exploring them tomorrow, as well as looking at using an enum type.


- Brian F

On 11/02/2011 05:53 PM, Tom Lane wrote:

Brian Fehrlebri...@consistentstate.com  writes:

I've got a query that I need to squeeze as much speed out of as I can.

Hmm ... are you really sure this is being run with work_mem = 50MB?
The hash join is getting batched, which means the executor thinks it's
working under a memory constraint significantly less than the size of
the filtered inner relation, which should be no more than a couple
megabytes according to this.

I'm not sure how much that will save, since the hashjoin seems to be
reasonably speedy anyway, but there's not much other fat to trim here.

One minor suggestion is to think whether you really need string
comparisons here or could convert that to use of an enum type.
String compares ain't cheap, especially not in non-C locales.

regards, tom lane



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two table join just not fast enough.

2011-11-02 Thread Mark Kirkwood

On 03/11/11 09:12, Brian Fehrle wrote:



And here is a query plan.

Hash Join  (cost=17516.470..26386.660 rows=27624 width=4) (actual 
time=309.194..395.135 rows=12384 loops=1)

Hash Cond: (yankee.alpha = hotel_zulu.quebec)
  -   Bitmap Heap Scan on yankee  (cost=1066.470..8605.770 rows=27624 
width=20) (actual time=5.178..34.693 rows=26963 loops=1)

  Recheck Cond: (mike = 402513)
-   Bitmap Index Scan on hotel_alpha  (cost=0.000..1059.570 
rows=27624 width=0) (actual time=4.770..4.770 rows=26967 loops=1)

Index Cond: (mike = 402513)
  -   Hash  (cost=14465.080..14465.080 rows=114154 width=16) (actual 
time=303.717..303.717 rows=129908 loops=1)

  Buckets: 4096  Batches: 8  Memory Usage: 784kB
-   Bitmap Heap Scan on hotel_zulu  (cost=2461.770..14465.080 
rows=114154 width=16) (actual time=25.642..185.253 rows=129908 loops=1)

Recheck Cond: ((two)::golf = ANY ('xray'::golf[]))
  -   Bitmap Index Scan on kilo  (cost=0.000..2433.230 
rows=114154 width=0) (actual time=23.887..23.887 rows=130292 loops=1)

  Index Cond: ((two)::golf = ANY ('xray'::golf[]))



Might be worth posting table definitions, as this plan does not 
immediately look like it came from the query you posted. Also unless I 
am misreading the output looks like you have some custom datatypes (e.g 
'golf'), so more info there could be useful too.


When we have that, there may be something to be learned from examining 
the pg_stats data for the join and predicate columns used in these queries.


regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance