It looks like you guys were right. I think vacuum analyzing this made it do an IOS. It seems like materialized views are going to be the best bet. I see how that would allow sequential reading. Thanks!

Bryce

December 12, 2013 4:01 PM
On Thu, Dec 12, 2013 at 3:04 PM, Bryce Covert <br...@brycecovertoperations.com> wrote:
Not sure if this is helpful, but I tried upgrading to 9.2, and here's what I got:

---------
 Limit  (cost=0.00..535.78 rows=50 width=8) (actual time=1037.376..135043.945 rows=50 loops=1)
   Output: premiseaccount.id, (sum(electricusage.usage))
   Buffers: shared hit=4851 read=18718
   ->  GroupAggregate  (cost=0.00..198012.28 rows=18479 width=8) (actual time=1037.369..135043.700 rows=50 loops=1)
         Output: premiseaccount.id, sum(electricusage.usage)

         Filter: ((sum(electricusage.usage) >= 3284::numeric) AND (sum(electricusage.usage) <= 3769::numeric))
         Rows Removed by Filter: 1476
         Buffers: shared hit=4851 read=18718
         ->  Nested Loop  (cost=0.00..196247.46 rows=148764 width=8) (actual time=107.092..134845.231 rows=15188 loops=1)
               Output: premiseaccount.id, electricusage.usage
               Buffers: shared hit=4851 read=18718
               ->  Index Only Scan using premiseaccount_bucket_58c70392619aa36f on public.premiseaccount premiseaccount  (cost=0.00..43135.13 rows=18479 width=4) (actual time=45.368..137.340 rows=1527 loops=1)
                     Output: premiseaccount.bucket, premiseaccount.id
                     Index Cond: (premiseaccount.bucket = '85375_single-family'::text)
                     Heap Fetches: 1527
 
You had to hit the heap for every row, meaning the index-only feature was useless.  Are you vacuuming enough?  How fast does this table change?  What is relallvisible from pg_class for these tables?

 
               ->  Index Scan using electricusage_premise_account_id on public.electricusage electricusage  (cost=0.00..8.20 rows=9 width=8) (actual time=22.306..88.136 rows=10 loops=1527)
                     Output: electricusage.id, electricusage.created, electricusage.modified, electricusage.from_date, electricusage.to_date, electricusage.usage, electricusage.demand, electricusage.bill_amount, electricusage.premise_account_id
                     Index Cond: (electricusage.premise_account_id = premiseaccount.id)
                     Filter: (electricusage.from_date >= '2012-11-20 00:00:00+00'::timestamp with time zone)
                     Rows Removed by Filter: 2
                     Buffers: shared hit=4850 read=18033
 Total runtime: 135044.256 ms
(23 rows)


Looks like it is doing an index only scan for the first table, but not for the second. I tried  creating two indexes that theoretically should make it not have to go to the physical table.:
    "electricusage_premise_account_id_36bc8999ced10059" btree (premise_account_id, from_date, usage)
    "ix_covered_2" btree (premise_account_id, from_date DESC, usage, id)

Any idea why it's not using that?

If the other IOS in this plan is anything to go by, then your table doesn't have enough all-visible pages to make it worthwhile.  So it chooses the smaller index, instead of the bigger one that could theoretically support an IOS.

Cheers,

Jeff
December 12, 2013 3:04 PM
Not sure if this is helpful, but I tried upgrading to 9.2, and here's what I got:

---------
 Limit  (cost=0.00..535.78 rows=50 width=8) (actual time=1037.376..135043.945 rows=50 loops=1)
   Output: premiseaccount.id, (sum(electricusage.usage))
   Buffers: shared hit=4851 read=18718
   ->  GroupAggregate  (cost=0.00..198012.28 rows=18479 width=8) (actual time=1037.369..135043.700 rows=50 loops=1)
         Output: premiseaccount.id, sum(electricusage.usage)
         Filter: ((sum(electricusage.usage) >= 3284::numeric) AND (sum(electricusage.usage) <= 3769::numeric))
         Rows Removed by Filter: 1476
         Buffers: shared hit=4851 read=18718
         ->  Nested Loop  (cost=0.00..196247.46 rows=148764 width=8) (actual time=107.092..134845.231 rows=15188 loops=1)
               Output: premiseaccount.id, electricusage.usage
               Buffers: shared hit=4851 read=18718
               ->  Index Only Scan using premiseaccount_bucket_58c70392619aa36f on public.premiseaccount premiseaccount  (cost=0.00..43135.13 rows=18479 width=4) (actual time=45.368..137.340 rows=1527 loops=1)
                     Output: premiseaccount.bucket, premiseaccount.id
                     Index Cond: (premiseaccount.bucket = '85375_single-family'::text)
                     Heap Fetches: 1527
                     Buffers: shared hit=1 read=685
               ->  Index Scan using electricusage_premise_account_id on public.electricusage electricusage  (cost=0.00..8.20 rows=9 width=8) (actual time=22.306..88.136 rows=10 loops=1527)
                     Output: electricusage.id, electricusage.created, electricusage.modified, electricusage.from_date, electricusage.to_date, electricusage.usage, electricusage.demand, electricusage.bill_amount, electricusage.premise_account_id
                     Index Cond: (electricusage.premise_account_id = premiseaccount.id)
                     Filter: (electricusage.from_date >= '2012-11-20 00:00:00+00'::timestamp with time zone)
                     Rows Removed by Filter: 2
                     Buffers: shared hit=4850 read=18033
 Total runtime: 135044.256 ms
(23 rows)


Looks like it is doing an index only scan for the first table, but not for the second. I tried  creating two indexes that theoretically should make it not have to go to the physical table.:
    "electricusage_premise_account_id_36bc8999ced10059" btree (premise_account_id, from_date, usage)
    "ix_covered_2" btree (premise_account_id, from_date DESC, usage, id)

Any idea why it's not using that?

Thanks!
Bryce

December 12, 2013 1:35 PM

On Thu, Dec 12, 2013 at 6:16 PM, Bryce Covert <br...@brycecovertoperations.com> wrote:

Thanks a lot for the help. I'm not familiar with explain buffers, but here's the results:

 Limit  (cost=0.00..648.71 rows=50 width=8) (actual time=653.681..52328.707 rows=50 loops=1)
   Buffers: shared hit=7875 read=9870
   ->  GroupAggregate  (cost=0.00..55672.36 rows=4291 width=8) (actual time=653.671..52328.480 rows=50 loops=1)

         Filter: ((sum(electricusage.usage) >= 3284::numeric) AND (sum(electricusage.usage) <= 3769::numeric))
         Buffers: shared hit=7875 read=9870
         ->  Nested Loop  (cost=0.00..55262.93 rows=34506 width=8) (actual time=432.129..52200.465 rows=30711 loops=1)
               Buffers: shared hit=7875 read=9870
               ->  Index Scan using premiseaccount_bucket_58c70392619aa36f on premiseaccount premiseaccount  (cost=0.00..15433.71 rows=4291 width=4) (actual time=338.160..10014.780 rows=3437 loops=1)
                     Index Cond: ((bucket)::text = '85349_single-family'::text)
                     Buffers: shared hit=744 read=2692
               ->  Index Scan using electricusage_premise_account_id on electricusage electricusage  (cost=0.00..9.17 rows=9 width=8) (actual time=11.430..12.235 rows=9 loops=3437)
                     Index Cond: (premise_account_id = premiseaccount.id)
                     Filter: (from_date >= '2012-11-20 00:00:00+00'::timestamp with time zone)
                     Buffers: shared hit=7131 read=7178
 Total runtime: 52329.028 ms
(15 rows)

and the iostat results...
Linux 3.11.6-x86_64-linode35 (preview-aps-new)     12/12/2013     _x86_64_    (8 CPU)

...
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.10    47.10  152.40    7.80     2.20     0.21    30.92     1.36    8.50    8.03   17.53   6.14  98.33
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00


This means it's doing random I/O, and that your disk is a single 7200RPM drive (152 r/s being typical for that hardware).

You can improve this by either:

1 - Turning that random I/O pattern into sequential, or
2 - Getting better I/O.

I'll assume 2 isn't available to you on linode, so for 1, you could try lowering effective_cache_size substantially. It seems you're not getting nearly as much caching as you think (ie 2GB). However, I doubt there's a plan that can get you significantly better performance given your hardware.

You may shave a few seconds, though, if you increase work_mem. It seems it should have used a bitmap index scan for at least one of the index scans there, and a low work_mem could be what's limiting the planner's possibilities. What are your settings in that area?


December 12, 2013 1:16 PM
Hey Claudio,

Thanks a lot for the help. I'm not familiar with explain buffers, but here's the results:

 Limit  (cost=0.00..648.71 rows=50 width=8) (actual time=653.681..52328.707 rows=50 loops=1)
   Buffers: shared hit=7875 read=9870
   ->  GroupAggregate  (cost=0.00..55672.36 rows=4291 width=8) (actual time=653.671..52328.480 rows=50 loops=1)
         Filter: ((sum(electricusage.usage) >= 3284::numeric) AND (sum(electricusage.usage) <= 3769::numeric))
         Buffers: shared hit=7875 read=9870
         ->  Nested Loop  (cost=0.00..55262.93 rows=34506 width=8) (actual time=432.129..52200.465 rows=30711 loops=1)
               Buffers: shared hit=7875 read=9870
               ->  Index Scan using premiseaccount_bucket_58c70392619aa36f on premiseaccount premiseaccount  (cost=0.00..15433.71 rows=4291 width=4) (actual time=338.160..10014.780 rows=3437 loops=1)
                     Index Cond: ((bucket)::text = '85349_single-family'::text)
                     Buffers: shared hit=744 read=2692
               ->  Index Scan using electricusage_premise_account_id on electricusage electricusage  (cost=0.00..9.17 rows=9 width=8) (actual time=11.430..12.235 rows=9 loops=3437)
                     Index Cond: (premise_account_id = premiseaccount.id)
                     Filter: (from_date >= '2012-11-20 00:00:00+00'::timestamp with time zone)
                     Buffers: shared hit=7131 read=7178
 Total runtime: 52329.028 ms
(15 rows)

and the iostat results...
Linux 3.11.6-x86_64-linode35 (preview-aps-new)     12/12/2013     _x86_64_    (8 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              6.94    65.68   12.16    7.64     0.36     0.29    67.06     0.44   22.11    3.56   51.66   1.15   2.28
xvdb              0.00     0.00    0.00    0.00     0.00     0.00    10.61     0.00    1.82    1.82    0.00   1.82   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.10    47.10  152.40    7.80     2.20     0.21    30.92     1.36    8.50    8.03   17.53   6.14  98.33
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.20     2.30  212.10    0.70     3.22     0.01    31.09     1.47    6.88    6.90    2.86   4.63  98.62
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.20     8.30  183.20    5.10     2.46     0.05    27.31     1.68    8.85    6.68   86.78   5.24  98.70
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.10     0.00  165.70    0.00     2.36     0.00    29.20     1.46    8.86    8.86    0.00   5.95  98.63
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

I'm not sure how to process this except that 2.5MB/s seems really slow, and it looks like it is using postgres' cache quite a bit.

Thanks,
Bryce


December 12, 2013 12:48 PM



Well, did you run benchmarks? How many IOPS do you get from the volumes?

Try running "iostat -x -m -d 10" while the slow query is running and pasting the results (or a relevant sample of them).

Also, do run "explain (analyze, buffers)" instead of plain "explain analyze".

Reply via email to