Re: [PERFORM] Join the master table with other table is very slow (partitioning)

AI Rumman Fri, 15 Mar 2013 08:13:10 -0700

On Fri, Mar 15, 2013 at 11:09 AM, Ao Jianwang <[email protected]> wrote:


> Hi Rumman,
>
> Thanks for your response. I follow the guide to build the partition. The
> settings should be good. See the following result. Any insight? thanks.
>
> dailyest=# select version();
>                                                   version
>
>
> ------------------------------------------------------------------------------------------------------------
>  PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc
> (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
> (1 row)
>
> dailyest=# show constraint_exclusion;
>  constraint_exclusion
> ----------------------
>  on
> (1 row)
>
> On Fri, Mar 15, 2013 at 11:04 PM, AI Rumman <[email protected]> wrote:
>
>> Which version of Postgresql are you using?
>> Have you set constraint_exclusion to parition?
>>
>>
>> On Fri, Mar 15, 2013 at 11:02 AM, Ao Jianwang <[email protected]> wrote:
>>
>>> Hi Experts,
>>>
>>> I found if we join the master table with other small table, then the
>>> running time is slow. While, if we join each child table with the small
>>> table, then it's very fast. Any comments and suggestions are greatly
>>> appreciated.
>>>
>>> *For example, par_list table is small(about 50k rows), while par_est is
>>> very large, for each day it's about 400MB. Therefore, we partition it by
>>> day. However, the query plan for joining the master table with par_list is
>>> bad, so the running time is slow. The good plan should be join each
>>> partition table with par_list separately, then aggregate the result
>>> together. *
>>> *
>>> *
>>> *1. Join the master table with a small table. It's slow.*
>>> dailyest=# explain (analyze on, buffers on)
>>> dailyest-#  SELECT e.date, max(e.estimate)
>>> dailyest-#             FROM
>>> dailyest-#                 par_list l,
>>> dailyest-#                 par_est e
>>> dailyest-#             WHERE
>>> dailyest-#                 l.id = e.list_id and
>>> dailyest-#                 e.date BETWEEN '2012-07-08' and '2012-07-10'
>>> and
>>> dailyest-#                 l.fid = 1 and
>>> dailyest-#                 l.sid = 143441 and
>>> dailyest-#                 l.cid in (36, 39, 6000)  and
>>> dailyest-#                 e.aid = 333710667
>>> dailyest-#             GROUP BY e.date
>>> dailyest-#             ORDER BY e.date;
>>>
>>> -----------------------
>>>  GroupAggregate  (cost=745326.86..745326.88 rows=1 width=8) (actual
>>> time=6281.364..6281.366 rows=3 loops=1)
>>>    Buffers: shared hit=3 read=175869
>>>    ->  Sort  (cost=745326.86..745326.86 rows=1 width=8) (actual
>>> time=6281.358..6281.358 rows=6 loops=1)
>>>          Sort Key: e.date
>>>          Sort Method: quicksort  Memory: 25kB
>>>          Buffers: shared hit=3 read=175869
>>>          ->  Nested Loop  (cost=0.00..745326.85 rows=1 width=8) (actual
>>> time=1228.493..6281.349 rows=6 loops=1)
>>>                Join Filter: (l.id = e.list_id)
>>>                Rows Removed by Join Filter: 4040
>>>                Buffers: shared hit=3 read=175869
>>>                ->  Seq Scan on par_list l  (cost=0.00..1213.10 rows=2
>>> width=4) (actual time=0.010..38.272 rows=2 loops=1)
>>>                      Filter: ((fid = 1) AND (sid = 143441) AND (cid =
>>> ANY ('{36,39,6000}'::integer[])))
>>>                      Rows Removed by Filter: 50190
>>>                      Buffers: shared hit=3 read=269
>>>                ->  Materialize  (cost=0.00..744102.56 rows=407 width=12)
>>> (actual time=9.707..3121.053 rows=2023 loops=2)
>>>                      Buffers: shared read=175600
>>>                      ->  Append  (cost=0.00..744100.52 rows=407
>>> width=12) (actual time=19.410..6240.044 rows=2023 loops=1)
>>>                            Buffers: shared read=175600
>>>                            ->  Seq Scan on par_est e  (cost=0.00..0.00
>>> rows=1 width=12) (actual time=0.001..0.001 rows=0 loops=1)
>>>                                  Filter: ((date >= '2012-07-08'::date)
>>> AND (date <= '2012-07-10'::date) AND (aid = 333710667))
>>>                            ->  Seq Scan on par_est_2012_07 e
>>>  (cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.000 rows=0 loops=1)
>>>                                  Filter: ((date >= '2012-07-08'::date)
>>> AND (date <= '2012-07-10'::date) AND (aid = 333710667))
>>>                            ->  Seq Scan on par_est_2012_07_08 e
>>>  (cost=0.00..247736.09 rows=135 width=12) (actual time=19.408..2088.627
>>> rows=674 loops=1)
>>>                                  Filter: ((date >= '2012-07-08'::date)
>>> AND (date <= '2012-07-10'::date) AND (aid = 333710667))
>>>                                  Rows Removed by Filter: 10814878
>>>                                  Buffers: shared read=58463
>>>                            ->  Seq Scan on par_est_2012_07_09 e
>>>  (cost=0.00..248008.81 rows=137 width=12) (actual time=6.390..1963.238
>>> rows=676 loops=1)
>>>                                  Filter: ((date >= '2012-07-08'::date)
>>> AND (date <= '2012-07-10'::date) AND (aid = 333710667))
>>>                                  Rows Removed by Filter: 10826866
>>>                                  Buffers: shared read=58528
>>>                            ->  Seq Scan on par_est_2012_07_10 e
>>>  (cost=0.00..248355.62 rows=133 width=12) (actual time=15.135..2187.312
>>> rows=673 loops=1)
>>>                                  Filter: ((date >= '2012-07-08'::date)
>>> AND (date <= '2012-07-10'::date) AND (aid = 333710667))
>>>                                  Rows Removed by Filter: 10841989
>>>                                  Buffers: shared read=58609
>>>  Total runtime: 6281.444 ms
>>> (35 rows)
>>>
>>>
>>> *2. Join each partition table with small table (par_list) and union the
>>> result. This runs very fast. However, it's not reasonable if we union 180
>>> SELECT statements (for example, the date is from 2012-07-01 to 2012-12-31.
>>> Any better suggestions.*
>>> *
>>> *
>>> dailyest=# explain (analyze on, buffers on)
>>> dailyest-#  SELECT e.date, max(e.estimate)
>>> dailyest-#             FROM
>>> dailyest-#                 par_list l,
>>> dailyest-#                 par_est_2012_07_08 e
>>> dailyest-#             WHERE
>>> dailyest-#                 l.id = e.list_id and
>>> dailyest-#                 e.date = '2012-07-08' and
>>> dailyest-#                 l.fid = 1 and
>>> dailyest-#                 l.sid = 143441 and
>>> dailyest-#                 l.cid in (36, 39, 6000)  and
>>> dailyest-#                 e.aid = 333710667
>>> dailyest-#             GROUP BY e.date
>>> dailyest-# UNION ALL
>>> dailyest-#  SELECT e.date, max(e.estimate)
>>> dailyest-#             FROM
>>> dailyest-#                 par_list l,
>>> dailyest-#                 par_est_2012_07_09 e
>>> dailyest-#             WHERE
>>> dailyest-#                 l.id = e.list_id and
>>> dailyest-#                 e.date = '2012-07-09' and
>>> dailyest-#                 l.fid = 1 and
>>> dailyest-#                 l.sid = 143441 and
>>> dailyest-#                 l.cid in (36, 39, 6000)  and
>>> dailyest-#                 e.aid = 333710667
>>> dailyest-#             GROUP BY e.date
>>> dailyest-# UNION ALL
>>> dailyest-#  SELECT e.date, max(e.estimate)
>>> dailyest-#             FROM
>>> dailyest-#                 par_list l,
>>> dailyest-#                 par_est_2012_07_10 e
>>> dailyest-#             WHERE
>>> dailyest-#                 l.id = e.list_id and
>>> dailyest-#                 e.date = '2012-07-10' and
>>> dailyest-#                 l.fid = 1 and
>>> dailyest-#                 l.sid = 143441 and
>>> dailyest-#                 l.cid in (36, 39, 6000)  and
>>> dailyest-#                 e.aid = 333710667
>>> dailyest-#             GROUP BY e.date
>>> dailyest-# ;
>>>
>>>
>>>                      QUERY PLAN
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------
>>> ------------------------------------------------------
>>>  Result  (cost=0.00..91.49 rows=3 width=8) (actual time=83.736..254.912
>>> rows=3 loops=1)
>>>    Buffers: shared hit=27 read=28
>>>    ->  Append  (cost=0.00..91.49 rows=3 width=8) (actual
>>> time=83.735..254.910 rows=3 loops=1)
>>>          Buffers: shared hit=27 read=28
>>>          ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual
>>> time=83.735..83.735 rows=1 loops=1)
>>>                Buffers: shared hit=9 read=12
>>>                ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8)
>>> (actual time=63.920..83.728 rows=2 loops=1)
>>>                      Buffers: shared hit=9 read=12
>>>                      ->  Index Scan using par_list_sid_fid_cid_key on
>>> par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=1.540..1.550
>>> rows=2 loops=1)
>>>                            Index Cond: ((sid = 143441) AND (fid = 1) AND
>>> (cid = ANY ('{36,39,6000}'::integer[])))
>>>                            Buffers: shared hit=7 read=4
>>>                      ->  Index Only Scan using par_est_2012_07_08_pkey
>>> on par_est_2012_07_08 e  (cost=0.00..5.94 rows=1 width=12) (actual time=
>>> 41.083..41.083 rows=1 loops=2)
>>>                            Index Cond: ((date = '2012-07-08'::date) AND
>>> (list_id = l.id) AND (aid = 333710667))
>>>                            Heap Fetches: 0
>>>                            Buffers: shared hit=2 read=8
>>>          ->  GroupAggregate  (cost=0.00..30.48 rows=1 width=8) (actual
>>> time=76.911..76.911 rows=1 loops=1)
>>>                Buffers: shared hit=9 read=8
>>>                ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8)
>>> (actual time=57.580..76.909 rows=2 loops=1)
>>>                      Buffers: shared hit=9 read=8
>>>                      ->  Index Scan using par_list_sid_fid_cid_key on
>>> par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.016
>>> rows=2 loops=1)
>>>                            Index Cond: ((sid = 143441) AND (fid = 1) AND
>>> (cid = ANY ('{36,39,6000}'::integer[])))
>>>                            Buffers: shared hit=7
>>>                      ->  Index Only Scan using par_est_2012_07_09_pkey
>>> on par_est_2012_07_09 e  (cost=0.00..5.94 rows=1 width=12) (actual
>>> time=38.440..38.442 rows=1 loops=2)
>>>                            Index Cond: ((date = '2012-07-09'::date) AND
>>> (list_id = l.id) AND (aid = 333710667))
>>>                            Heap Fetches: 0
>>>                            Buffers: shared hit=2 read=8
>>>          ->  GroupAggregate  (cost=0.00..30.49 rows=1 width=8) (actual
>>> time=94.262..94.262 rows=1 loops=1)
>>>                Buffers: shared hit=9 read=8
>>>                ->  Nested Loop  (cost=0.00..30.47 rows=1 width=8)
>>> (actual time=74.393..94.259 rows=2 loops=1)
>>>                      Buffers: shared hit=9 read=8
>>>                      ->  Index Scan using par_list_sid_fid_cid_key on
>>> par_list l  (cost=0.00..18.56 rows=2 width=4) (actual time=0.007..0.017
>>> rows=2 loops=1)
>>>                            Index Cond: ((sid = 143441) AND (fid = 1) AND
>>> (cid = ANY ('{36,39,6000}'::integer[])))
>>>                            Buffers: shared hit=7
>>>                      ->  Index Only Scan using par_est_2012_07_10_pkey
>>> on par_est_2012_07_10 e  (cost=0.00..5.95 rows=1 width=12) (actual
>>> time=47.116..47.117 rows=1 loops=2)
>>>                            Index Cond: ((date = '2012-07-10'::date) AND
>>> (list_id = l.id) AND (aid = 333710667))
>>>                            Heap Fetches: 0
>>>                            Buffers: shared hit=2 read=8
>>>  Total runtime: 255.074 ms
>>> (38 rows)
>>>
>>>
>>
> At first. you may try the following out and find out if the partition
constraint exclusion is working or not::

explain
select *
FROM
par_est e
WHERE
e.date BETWEEN '2012-07-08' and '2012-07-10'

Re: [PERFORM] Join the master table with other table is very slow (partitioning)

Reply via email to