Re: [PERFORM] Hash Join node sometimes slow

2014-07-03 Thread Dave Roberge
Tom Lane writes:
 I'd bet on the extra time being in I/O for the per-batch temp files, since 
 it's hard
 to see what else would be different if the data were identical in each run.
 Maybe the kernel is under memory pressure and is dropping the file data from
 in-memory disk cache.  Or maybe it's going to disk all the time but the slow 
 runs
 face more I/O congestion.
 
 Personally, for a problem of this size I'd increase work_mem enough so you
 don't get multiple batches in the first place.

Tom thanks for the response. I'm very much a novice in this area - what do you 
mean by problem of this size, i.e. number of rows, hash memory usage? Does 
'shared read' mean either 1) it was read from disk or 2) it was read from 
in-memory disk cache?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hash Join node sometimes slow

2014-07-02 Thread Dave Roberge
Hi, I'm in the process of attempting to tune some slow queries. I came 
across a scenario where I'm not entirely sure how I 
might figure out why a node is taking awhile to process. I'm not concerned 
with the query itself, we are working to figure 
out how we can make it faster. But I was hoping someone might be able to 
provide some insight into why a hash join is 
sometimes slow.

For example, running explain (analyze, buffers) with the query, 4/5 times we 
will see the following:

-  Hash Join  (cost=16385.76..103974.09 rows=523954 width=64) (actual 
time=532.634..4018.678 rows=258648 loops=1)
  Hash Cond: (p.a = c.c)
  Buffers: shared hit=4 read=29147, temp read=12943 written=12923
  -  Seq Scan on p (cost=0.00..38496.88 rows=1503188 width=60) (actual 
time=0.013..1388.205 rows=1503188 loops=1)
Buffers: shared hit=1 read=23464
  -  Hash  (cost=15382.47..15382.47 rows=57703 width=12) (actual 
time=527.237..527.237 rows=57789 loops=1)
Buckets: 4096  Batches: 4  Memory Usage: 632kB
Buffers: shared hit=3 read=5683, temp read=617 written=771

The other times, we will see something like this:

-  Hash Join  (cost=16385.76..103974.09 rows=523954 width=64) (actual 
time=587.277..15208.621 rows=258648 loops=1)
  Hash Cond: (p.a = c.c)
  Buffers: shared hit=26 read=29125, temp read=12943 written=12923
  -  Seq Scan on p  (cost=0.00..38496.88 rows=1503188 width=60) (actual 
time=0.013..1525.608 rows=1503188 loops=1)
Buffers: shared hit=22 read=23443
  -  Hash  (cost=15382.47..15382.47 rows=57703 width=12) (actual 
time=581.638..581.638 rows=57789 loops=1)
Buckets: 4096  Batches: 4  Memory Usage: 632kB
Buffers: shared hit=4 read=5682, temp read=617 written=771

Does anyone have ideas on what might be causing the difference in timing for 
the hash join node?

Thanks



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance