Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Alex Vinnik
On Mon, Jan 28, 2013 at 6:55 PM, Filip Rembiałkowski wrote:

>
> On Mon, Jan 28, 2013 at 5:43 PM, Alex Vinnik  wrote:
>
>> It sure turned out that default settings are not a good fit.
>>
>
> do you know pgtune?
> it's a good tool for starters, if you want a fast postgres and don't
> really want to learn what's behind the scenes.
>
Yeah.. I came across pgtune but noticed that latest version dated
2009-10-29 http://pgfoundry.org/frs/?group_id=1000416 which is kind of
outdated. Tar file has settings for pg 8.3. Is still relevant?


>
> random_page_cost=1 might be not what you really want.
> it would mean that random reads are as fast as as sequential reads, which
> probably is true only for SSD
>
What randon_page_cost would be more appropriate for EC2 EBS Provisioned
volume that can handle 2,000 IOPS?

>
>
>
> Filip
>
>


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Alex Vinnik
Setting work_mem to 64MB triggers in memory sort but look what happens with
views look up. PG goes through all records there "Seq Scan on views"
instead of using visitor_id index and I have only subset of real data to
play around. Can imagine what cost would be running it against bigger
dataset. Something else is in play here that makes planner to take this
route. Any ideas how to gain more insight into planner's inner workings?

QUERY PLAN
Sort  (cost=960280.46..960844.00 rows=225414 width=8) (actual
time=23328.040..23537.126 rows=209401 loops=1)
  Sort Key: visits.id, views.id
  Sort Method: quicksort  Memory: 15960kB
  ->  Hash Join  (cost=8089.16..940238.66 rows=225414 width=8) (actual
time=6622.072..22995.890 rows=209401 loops=1)
Hash Cond: (views.visit_id = visits.id)
->  Seq Scan on views  (cost=0.00..831748.05 rows=8724205 width=8)
(actual time=0.093..10552.306 rows=6995893 loops=1)
->  Hash  (cost=6645.51..6645.51 rows=115492 width=4) (actual
time=307.389..307.389 rows=131311 loops=1)
  Buckets: 16384  Batches: 1  Memory Usage: 4617kB
  ->  Index Scan using visits_created_at_index on visits
 (cost=0.00..6645.51 rows=115492 width=4) (actual time=0.040..163.151
rows=131311 loops=1)
Index Cond: ((created_at >= '2013-01-15
00:00:00'::timestamp without time zone) AND (created_at < '2013-01-16
00:00:00'::timestamp without time zone))
Total runtime: 23733.045 ms


On Mon, Jan 28, 2013 at 8:31 PM, Merlin Moncure  wrote:

> On Mon, Jan 28, 2013 at 5:43 PM, Alex Vinnik  wrote:
> > It sure turned out that default settings are not a good fit. Setting
> > random_page_cost to 1.0 made query to run in 2.6 seconds and I clearly
> see
> > that indexes are being used in explain plan and IO utilization is close
> to
> > 0.
> >
> > QUERY PLAN
> > Sort  (cost=969787.23..970288.67 rows=200575 width=8) (actual
> > time=2176.045..2418.162 rows=241238 loops=1)
> >   Sort Key: visits.id, views.id
> >   Sort Method: external sort  Disk: 4248kB
> >   ->  Nested Loop  (cost=0.00..950554.81 rows=200575 width=8) (actual
> > time=0.048..1735.357 rows=241238 loops=1)
> > ->  Index Scan using visits_created_at_index on visits
> > (cost=0.00..5459.16 rows=82561 width=4) (actual time=0.032..178.591
> > rows=136021 loops=1)
> >   Index Cond: ((created_at >= '2012-12-15
> 00:00:00'::timestamp
> > without time zone) AND (created_at < '2012-12-16 00:00:00'::timestamp
> > without time zone))
> > ->  Index Scan using views_visit_id_index on views
> > (cost=0.00..11.33 rows=12 width=8) (actual time=0.004..0.006 rows=2
> > loops=136021)
> >   Index Cond: (visit_id = visits.id)
> > Total runtime: 2635.169 ms
> >
> > However I noticed that sorting is done using disk("external sort  Disk:
> > 4248kB") which prompted me to take a look at work_mem. But it turned out
> > that small increase to 4MB from default 1MB turns off index usage and
> query
> > gets x10 slower. IO utilization jumped to 100% from literally nothing. so
> > back to square one...
> >
> > QUERY PLAN
> > Sort  (cost=936642.75..937144.19 rows=200575 width=8) (actual
> > time=33200.762..33474.443 rows=241238 loops=1)
> >   Sort Key: visits.id, views.id
> >   Sort Method: external merge  Disk: 4248kB
> >   ->  Hash Join  (cost=6491.17..917410.33 rows=200575 width=8) (actual
> > time=7156.498..32723.221 rows=241238 loops=1)
> > Hash Cond: (views.visit_id = visits.id)
> > ->  Seq Scan on views  (cost=0.00..832189.95 rows=8768395
> width=8)
> > (actual time=0.100..12126.342 rows=8200704 loops=1)
> > ->  Hash  (cost=5459.16..5459.16 rows=82561 width=4) (actual
> > time=353.683..353.683 rows=136021 loops=1)
> >   Buckets: 16384  Batches: 2 (originally 1)  Memory Usage:
> > 4097kB
> >   ->  Index Scan using visits_created_at_index on visits
> > (cost=0.00..5459.16 rows=82561 width=4) (actual time=0.032..175.051
> > rows=136021 loops=1)
> > Index Cond: ((created_at >= '2012-12-15
> > 00:00:00'::timestamp without time zone) AND (created_at < '2012-12-16
> > 00:00:00'::timestamp without time zone))
> > Total runtime: 33698.000 ms
> >
> > Basically PG is going through all views again and not using "Index Scan
> > using views_visit_id_index on views". Looks like setting work_mem
> confuses
> > planner somehow. Any idea what can be done to do sorting in memory. I
> > suspect it should make query even more faster. Thanks -Alex
>
> hm, what happens when you set work_mem a fair amount higher? (say,
> 64mb).   You can set it for one session by going "set work_mem='64mb';
> " as opposed to the entire server in postgresql.conf.
>
> merlin
>


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Filip Rembiałkowski
On Tue, Jan 29, 2013 at 8:24 AM, Alex Vinnik  wrote:

> On Mon, Jan 28, 2013 at 6:55 PM, Filip Rembiałkowski 
> wrote:
>
>>
>> do you know pgtune?
>> it's a good tool for starters, if you want a fast postgres and don't
>> really want to learn what's behind the scenes.
>>
> Yeah.. I came across pgtune but noticed that latest version dated
> 2009-10-29 http://pgfoundry.org/frs/?group_id=1000416 which is kind of
> outdated. Tar file has settings for pg 8.3. Is still relevant?
>

Yes, I'm sure it will not do anything bad to your config.


>
>> random_page_cost=1 might be not what you really want.
>> it would mean that random reads are as fast as as sequential reads, which
>> probably is true only for SSD
>>
> What randon_page_cost would be more appropriate for EC2 EBS Provisioned
> volume that can handle 2,000 IOPS?
>
>>
>>
I'd say: don't guess. Measure.
Use any tool that can test sequential disk block reads versus random disk
block reads.
bonnie++ is quite popular.



Filip


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Merlin Moncure
On Tue, Jan 29, 2013 at 8:41 AM, Alex Vinnik  wrote:
> Setting work_mem to 64MB triggers in memory sort but look what happens with
> views look up. PG goes through all records there "Seq Scan on views" instead
> of using visitor_id index and I have only subset of real data to play
> around. Can imagine what cost would be running it against bigger dataset.
> Something else is in play here that makes planner to take this route. Any
> ideas how to gain more insight into planner's inner workings?

did you set effective_cache_seize as noted upthread?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Ben Chobot
On Jan 29, 2013, at 6:24 AM, Alex Vinnik wrote:

> random_page_cost=1 might be not what you really want. 
> it would mean that random reads are as fast as as sequential reads, which 
> probably is true only for SSD
> What randon_page_cost would be more appropriate for EC2 EBS Provisioned 
> volume that can handle 2,000 IOPS? 

For EC2 Provisioned IOPS volumes - not standard EBS - random_page_cost=1 is 
exactly what you want.



Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Alex Vinnik
On Tue, Jan 29, 2013 at 11:39 AM, Ben Chobot  wrote:

> On Jan 29, 2013, at 6:24 AM, Alex Vinnik wrote:
>
> random_page_cost=1 might be not what you really want.
>> it would mean that random reads are as fast as as sequential reads, which
>> probably is true only for SSD
>>
> What randon_page_cost would be more appropriate for EC2 EBS Provisioned
> volume that can handle 2,000 IOPS?
>
>
> For EC2 Provisioned IOPS volumes - not standard EBS - random_page_cost=1
> is exactly what you want.
>
> Well... after some experimentation it turned out that random_page_cost=0.6
gives me fast query

QUERY PLAN
Sort  (cost=754114.96..754510.46 rows=158199 width=8) (actual
time=1839.324..2035.405 rows=209401 loops=1)
  Sort Key: visits.id, views.id
  Sort Method: quicksort  Memory: 15960kB
  ->  Nested Loop  (cost=0.00..740453.38 rows=158199 width=8) (actual
time=0.048..1531.592 rows=209401 loops=1)
->  Index Scan using visits_created_at_index on visits
 (cost=0.00..5929.82 rows=115492 width=4) (actual time=0.032..161.488
rows=131311 loops=1)
  Index Cond: ((created_at >= '2013-01-15 00:00:00'::timestamp
without time zone) AND (created_at < '2013-01-16 00:00:00'::timestamp
without time zone))
->  Index Scan using views_visit_id_index on views
 (cost=0.00..6.26 rows=10 width=8) (actual time=0.003..0.005 rows=2
loops=131311)
  Index Cond: (visit_id = visits.id)
Total runtime: 2234.142 ms

random_page_cost=0.7 slows it down 16 times

Sort  (cost=804548.42..804943.92 rows=158199 width=8) (actual
time=37011.337..37205.449 rows=209401 loops=1)
  Sort Key: visits.id, views.id
  Sort Method: quicksort  Memory: 15960kB
  ->  Merge Join  (cost=15871.37..790886.85 rows=158199 width=8) (actual
time=35673.602..36714.056 rows=209401 loops=1)
Merge Cond: (visits.id = views.visit_id)
->  Sort  (cost=15824.44..16113.17 rows=115492 width=4) (actual
time=335.486..463.085 rows=131311 loops=1)
  Sort Key: visits.id
  Sort Method: quicksort  Memory: 12300kB
  ->  Index Scan using visits_created_at_index on visits
 (cost=0.00..6113.04 rows=115492 width=4) (actual time=0.034..159.326
rows=131311 loops=1)
Index Cond: ((created_at >= '2013-01-15
00:00:00'::timestamp without time zone) AND (created_at < '2013-01-16
00:00:00'::timestamp without time zone))
->  Index Scan using views_visit_id_visit_buoy_index on views
 (cost=0.00..757596.22 rows=6122770 width=8) (actual time=0.017..30765.316
rows=5145902 loops=1)
Total runtime: 37407.174 ms

I am totally puzzled now...


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Merlin Moncure
On Tue, Jan 29, 2013 at 12:59 PM, Alex Vinnik  wrote:
>
>
>
> On Tue, Jan 29, 2013 at 11:39 AM, Ben Chobot  wrote:
>>
>> On Jan 29, 2013, at 6:24 AM, Alex Vinnik wrote:
>>
>>> random_page_cost=1 might be not what you really want.
>>> it would mean that random reads are as fast as as sequential reads, which
>>> probably is true only for SSD
>>
>> What randon_page_cost would be more appropriate for EC2 EBS Provisioned
>> volume that can handle 2,000 IOPS?
>>
>>
>> For EC2 Provisioned IOPS volumes - not standard EBS - random_page_cost=1
>> is exactly what you want.
>>
> Well... after some experimentation it turned out that random_page_cost=0.6
> gives me fast query
>
> QUERY PLAN
> Sort  (cost=754114.96..754510.46 rows=158199 width=8) (actual
> time=1839.324..2035.405 rows=209401 loops=1)
>   Sort Key: visits.id, views.id
>   Sort Method: quicksort  Memory: 15960kB
>   ->  Nested Loop  (cost=0.00..740453.38 rows=158199 width=8) (actual
> time=0.048..1531.592 rows=209401 loops=1)
> ->  Index Scan using visits_created_at_index on visits
> (cost=0.00..5929.82 rows=115492 width=4) (actual time=0.032..161.488
> rows=131311 loops=1)
>   Index Cond: ((created_at >= '2013-01-15 00:00:00'::timestamp
> without time zone) AND (created_at < '2013-01-16 00:00:00'::timestamp
> without time zone))
> ->  Index Scan using views_visit_id_index on views  (cost=0.00..6.26
> rows=10 width=8) (actual time=0.003..0.005 rows=2 loops=131311)
>   Index Cond: (visit_id = visits.id)
> Total runtime: 2234.142 ms
>
> random_page_cost=0.7 slows it down 16 times
>
> Sort  (cost=804548.42..804943.92 rows=158199 width=8) (actual
> time=37011.337..37205.449 rows=209401 loops=1)
>   Sort Key: visits.id, views.id
>   Sort Method: quicksort  Memory: 15960kB
>   ->  Merge Join  (cost=15871.37..790886.85 rows=158199 width=8) (actual
> time=35673.602..36714.056 rows=209401 loops=1)
> Merge Cond: (visits.id = views.visit_id)
> ->  Sort  (cost=15824.44..16113.17 rows=115492 width=4) (actual
> time=335.486..463.085 rows=131311 loops=1)
>   Sort Key: visits.id
>   Sort Method: quicksort  Memory: 12300kB
>   ->  Index Scan using visits_created_at_index on visits
> (cost=0.00..6113.04 rows=115492 width=4) (actual time=0.034..159.326
> rows=131311 loops=1)
> Index Cond: ((created_at >= '2013-01-15
> 00:00:00'::timestamp without time zone) AND (created_at < '2013-01-16
> 00:00:00'::timestamp without time zone))

> ->  Index Scan using views_visit_id_visit_buoy_index on views
> (cost=0.00..757596.22 rows=6122770 width=8) (actual time=0.017..30765.316
> rows=5145902 loops=1)

Something is awry here. pg is doing an index scan via
views_visit_id_visit_buoy_index with no matching condition.  What's
the definition of that index? The reason why the random_page_cost
adjustment is working is that you are highly penalizing sequential
type scans so that the database is avoiding the merge (sort A, sort B,
stepwise compare).

SQL server is doing a nestloop/index scan, just like the faster pg
plan, but is a bit faster because it's parallelizing.

 merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Jeff Janes
On Mon, Jan 28, 2013 at 3:43 PM, Alex Vinnik  wrote:
> It sure turned out that default settings are not a good fit. Setting
> random_page_cost to 1.0 made query to run in 2.6 seconds and I clearly see
> that indexes are being used in explain plan and IO utilization is close to
> 0.
>
> QUERY PLAN
> Sort  (cost=969787.23..970288.67 rows=200575 width=8) (actual
> time=2176.045..2418.162 rows=241238 loops=1)
>   Sort Key: visits.id, views.id
>   Sort Method: external sort  Disk: 4248kB

What query are you running?  The query you originally showed us should
not be doing this sort in the first place.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Alex Vinnik
On Tue, Jan 29, 2013 at 2:06 PM, Jeff Janes  wrote:

> >   Sort Key: visits.id, views.id
> >   Sort Method: external sort  Disk: 4248kB
>
> What query are you running?  The query you originally showed us should
> not be doing this sort in the first place.
>
> Cheers,
>
> Jeff
>

Here is the query

select visits.id, views.id
from visits join views on visits.id = views.visit_id
where visits.created_at >= '01/15/2013' and visits.created_at < '01/16/2013'
order by visits.id, views.id;

Original query didn't have order by clause

Here query plan w/o order by
Merge Join  (cost=18213.46..802113.80 rows=182579 width=8) (actual
time=13.693..145469.499 rows=209401 loops=1)
  Merge Cond: (visits.id = views.visit_id)
  ->  Sort  (cost=18195.47..18523.91 rows=131373 width=4) (actual
time=335.496..464.929 rows=131311 loops=1)
Sort Key: visits.id
Sort Method: quicksort  Memory: 12300kB
->  Index Scan using visits_created_at_index on visits
 (cost=0.00..7026.59 rows=131373 width=4) (actual time=0.037..162.047
rows=131311 loops=1)
  Index Cond: ((created_at >= '2013-01-15 00:00:00'::timestamp
without time zone) AND (created_at < '2013-01-16 00:00:00'::timestamp
without time zone))
  ->  Index Scan using views_visit_id_visit_buoy_index on views
 (cost=0.00..766120.99 rows=6126002 width=8) (actual
time=18.960..140565.130 rows=4014837 loops=1)
Total runtime: 145664.274 ms


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Alex Vinnik
index definition
CREATE INDEX views_visit_id_visit_buoy_index ON views USING btree
(visit_id, visit_buoy)



On Tue, Jan 29, 2013 at 1:35 PM, Merlin Moncure  wrote:

> On Tue, Jan 29, 2013 at 12:59 PM, Alex Vinnik 
> wrote:
> >
> >
> >
> > On Tue, Jan 29, 2013 at 11:39 AM, Ben Chobot 
> wrote:
> >>
> >> On Jan 29, 2013, at 6:24 AM, Alex Vinnik wrote:
> >>
> >>> random_page_cost=1 might be not what you really want.
> >>> it would mean that random reads are as fast as as sequential reads,
> which
> >>> probably is true only for SSD
> >>
> >> What randon_page_cost would be more appropriate for EC2 EBS Provisioned
> >> volume that can handle 2,000 IOPS?
> >>
> >>
> >> For EC2 Provisioned IOPS volumes - not standard EBS - random_page_cost=1
> >> is exactly what you want.
> >>
> > Well... after some experimentation it turned out that
> random_page_cost=0.6
> > gives me fast query
> >
> > QUERY PLAN
> > Sort  (cost=754114.96..754510.46 rows=158199 width=8) (actual
> > time=1839.324..2035.405 rows=209401 loops=1)
> >   Sort Key: visits.id, views.id
> >   Sort Method: quicksort  Memory: 15960kB
> >   ->  Nested Loop  (cost=0.00..740453.38 rows=158199 width=8) (actual
> > time=0.048..1531.592 rows=209401 loops=1)
> > ->  Index Scan using visits_created_at_index on visits
> > (cost=0.00..5929.82 rows=115492 width=4) (actual time=0.032..161.488
> > rows=131311 loops=1)
> >   Index Cond: ((created_at >= '2013-01-15
> 00:00:00'::timestamp
> > without time zone) AND (created_at < '2013-01-16 00:00:00'::timestamp
> > without time zone))
> > ->  Index Scan using views_visit_id_index on views
>  (cost=0.00..6.26
> > rows=10 width=8) (actual time=0.003..0.005 rows=2 loops=131311)
> >   Index Cond: (visit_id = visits.id)
> > Total runtime: 2234.142 ms
> >
> > random_page_cost=0.7 slows it down 16 times
> >
> > Sort  (cost=804548.42..804943.92 rows=158199 width=8) (actual
> > time=37011.337..37205.449 rows=209401 loops=1)
> >   Sort Key: visits.id, views.id
> >   Sort Method: quicksort  Memory: 15960kB
> >   ->  Merge Join  (cost=15871.37..790886.85 rows=158199 width=8) (actual
> > time=35673.602..36714.056 rows=209401 loops=1)
> > Merge Cond: (visits.id = views.visit_id)
> > ->  Sort  (cost=15824.44..16113.17 rows=115492 width=4) (actual
> > time=335.486..463.085 rows=131311 loops=1)
> >   Sort Key: visits.id
> >   Sort Method: quicksort  Memory: 12300kB
> >   ->  Index Scan using visits_created_at_index on visits
> > (cost=0.00..6113.04 rows=115492 width=4) (actual time=0.034..159.326
> > rows=131311 loops=1)
> > Index Cond: ((created_at >= '2013-01-15
> > 00:00:00'::timestamp without time zone) AND (created_at < '2013-01-16
> > 00:00:00'::timestamp without time zone))
>
> > ->  Index Scan using views_visit_id_visit_buoy_index on views
> > (cost=0.00..757596.22 rows=6122770 width=8) (actual time=0.017..30765.316
> > rows=5145902 loops=1)
>
> Something is awry here. pg is doing an index scan via
> views_visit_id_visit_buoy_index with no matching condition.  What's
> the definition of that index? The reason why the random_page_cost
> adjustment is working is that you are highly penalizing sequential
> type scans so that the database is avoiding the merge (sort A, sort B,
> stepwise compare).
>
> SQL server is doing a nestloop/index scan, just like the faster pg
> plan, but is a bit faster because it's parallelizing.
>
>  merlin
>


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Jeff Janes
On Mon, Jan 28, 2013 at 4:55 PM, Filip Rembiałkowski
 wrote:
>
> On Mon, Jan 28, 2013 at 5:43 PM, Alex Vinnik  wrote:
>>
>> It sure turned out that default settings are not a good fit.
>
>
> do you know pgtune?
> it's a good tool for starters, if you want a fast postgres and don't really
> want to learn what's behind the scenes.
>
> random_page_cost=1 might be not what you really want.
> it would mean that random reads are as fast as as sequential reads, which
> probably is true only for SSD

Or that the "reads" are cached and coming from RAM, which is almost
surely the case here.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance