Re: Performance of Query 60 on TPC-DS Benchmark

Nikita Malakhov Wed, 27 Nov 2024 23:58:34 -0800

Hi!

I would rather do not exclude add_partial_path_precheck, but modify it to
check just path costs
and do not count key chains length:


 foreach(p1, parent_rel->partial_pathlist)
{
   Path   *old_path = (Path *) lfirst(p1);
   if (total_cost > old_path->total_cost * STD_FUZZ_FACTOR)
      return false;
   if (old_path->total_cost > total_cost * STD_FUZZ_FACTOR)
      return true;
}

While running this modification I've got the following plan on current
master:

        QUERY PLAN
    >
----------------------------------------------------------------------------------------------------------------------------------------------------------->
 Limit  (cost=70.29..70.47 rows=3 width=100) (actual time=0.079..0.083
rows=0 loops=1)
   ->  Incremental Sort  (cost=70.29..70.47 rows=3 width=100) (actual
time=0.078..0.082 rows=0 loops=1)
         Sort Key: item.i_item_id,
(sum((sum(store_sales.ss_ext_sales_price))))
         Presorted Key: item.i_item_id
         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB
 Peak Memory: 25kB
         ->  GroupAggregate  (cost=70.26..70.32 rows=3 width=100) (actual
time=0.033..0.037 rows=0 loops=1)
               Group Key: item.i_item_id
               ->  Sort  (cost=70.26..70.27 rows=3 width=100) (actual
time=0.033..0.036 rows=0 loops=1)
                     Sort Key: item.i_item_id
                     Sort Method: quicksort  Memory: 25kB
                     ->  Append  (cost=23.42..70.23 rows=3 width=100)
(actual time=0.030..0.033 rows=0 loops=1)
                           ->  GroupAggregate  (cost=23.42..23.44 rows=1
width=100) (actual time=0.013..0.015 rows=0 loops=1)
                                 Group Key: item.i_item_id
                                 ->  Sort  (cost=23.42..23.43 rows=1
width=82) (actual time=0.013..0.014 rows=0 loops=1)
                                       Sort Key: item.i_item_id
                                       Sort Method: quicksort  Memory: 25kB
                                       ->  Nested Loop  (cost=10.96..23.41
rows=1 width=82) (actual time=0.006..0.008 rows=0 loops=1)
                                             ->  Nested Loop
 (cost=10.81..22.96 rows=1 width=86) (actual time=0.006..0.008 rows=0
loops=1)
                                                   ->  Nested Loop
 (cost=10.66..22.33 rows=2 width=90) (actual time=0.006..0.007 rows=0
loops=1)
                                                         ->  Hash Semi Join
 (cost=10.51..21.03 rows=1 width=72) (actual time=0.005..0.006 rows=0
loops=1)
                                                               Hash Cond:
(item.i_item_id = item_1.i_item_id)
                                                               ->  Seq Scan
on item  (cost=0.00..10.40 rows=40 width=72) (actual time=0.005..0.005
rows=0 l>
                                                               ->  Hash
 (cost=10.50..10.50 rows=1 width=68) (never executed)
                                                                     ->
 Seq Scan on item item_1  (cost=0.00..10.50 rows=1 width=68) (never
executed)

 Filter: (i_category = 'Children'::bpchar)
                                                         ->  Index Scan
using store_sales_pkey on store_sales  (cost=0.15..1.28 rows=2 width=26)
(never exe>
                                                               Index Cond:
(ss_item_sk = item.i_item_sk)
                                                   ->  Memoize
 (cost=0.15..0.30 rows=1 width=4) (never executed)
                                                         Cache Key:
store_sales.ss_addr_sk
                                                         Cache Mode: logical
                                                         ->  Index Scan
using customer_address_pkey on customer_address  (cost=0.14..0.29 rows=1
width=4) (>
                                                               Index Cond:
(ca_address_sk = store_sales.ss_addr_sk)
                                                               Filter:
(ca_gmt_offset = '-6'::numeric)
                                             ->  Index Scan using
date_dim_pkey on date_dim  (cost=0.15..0.30 rows=1 width=4) (never executed)
                                                   Index Cond: (d_date_sk =
store_sales.ss_sold_date_sk)
                                                   Filter: ((d_year = 1999)
AND (d_moy = 9))
                           ->  GroupAggregate  (cost=23.37..23.39 rows=1
width=100) (actual time=0.008..0.009 rows=0 loops=1)
                                 Group Key: item_2.i_item_id
                                 ->  Sort  (cost=23.37..23.37 rows=1
width=82) (actual time=0.008..0.009 rows=0 loops=1)
                                       Sort Key: item_2.i_item_id
                                       Sort Method: quicksort  Memory: 25kB
                                       ->  Nested Loop  (cost=10.95..23.36
rows=1 width=82) (actual time=0.002..0.003 rows=0 loops=1)
                                             ->  Nested Loop
 (cost=10.81..22.83 rows=1 width=86) (actual time=0.002..0.002 rows=0
loops=1)
                                                   ->  Nested Loop
 (cost=10.66..22.30 rows=1 width=90) (actual time=0.001..0.002 rows=0
loops=1)
                                                         ->  Hash Semi Join
 (cost=10.51..21.03 rows=1 width=72) (actual time=0.001..0.002 rows=0
loops=1)
                                                               Hash Cond:
(item_2.i_item_id = item_3.i_item_id)
                                                               ->  Seq Scan
on item item_2  (cost=0.00..10.40 rows=40 width=72) (actual
time=0.001..0.001 r>
                                                               ->  Hash
 (cost=10.50..10.50 rows=1 width=68) (never executed)
                                                                     ->
 Seq Scan on item item_3  (cost=0.00..10.50 rows=1 width=68) (never
executed)

 Filter: (i_category = 'Children'::bpchar)
                                                         ->  Index Scan
using catalog_sales_pkey on catalog_sales  (cost=0.15..1.26 rows=1
width=26) (never>
                                                               Index Cond:
(cs_item_sk = item_2.i_item_sk)
                                                   ->  Index Scan using
date_dim_pkey on date_dim date_dim_1  (cost=0.15..0.34 rows=1 width=4)
(never execu>
                                                         Index Cond:
(d_date_sk = catalog_sales.cs_sold_date_sk)
                                                         Filter: ((d_year =
1999) AND (d_moy = 9))
                                             ->  Index Scan using
customer_address_pkey on customer_address customer_address_1
 (cost=0.14..0.33 rows=1 wid>
                                                   Index Cond:
(ca_address_sk = catalog_sales.cs_bill_addr_sk)
                                                   Filter: (ca_gmt_offset =
'-6'::numeric)
                           ->  GroupAggregate  (cost=23.37..23.39 rows=1
width=100) (actual time=0.008..0.008 rows=0 loops=1)
                                 Group Key: item_4.i_item_id
                                 ->  Sort  (cost=23.37..23.37 rows=1
width=82) (actual time=0.007..0.008 rows=0 loops=1)
                                       Sort Key: item_4.i_item_id
                                       Sort Method: quicksort  Memory: 25kB
                                       ->  Nested Loop  (cost=10.95..23.36
rows=1 width=82) (actual time=0.001..0.001 rows=0 loops=1)
                                             ->  Nested Loop
 (cost=10.81..22.83 rows=1 width=86) (actual time=0.001..0.001 rows=0
loops=1)
                                                   ->  Nested Loop
 (cost=10.66..22.30 rows=1 width=90) (actual time=0.001..0.001 rows=0
loops=1)
                                                         ->  Hash Semi Join
 (cost=10.51..21.03 rows=1 width=72) (actual time=0.001..0.001 rows=0
loops=1)
                                                               Hash Cond:
(item_4.i_item_id = item_5.i_item_id)
                                                               ->  Seq Scan
on item item_4  (cost=0.00..10.40 rows=40 width=72) (actual
time=0.000..0.000 r>
                                                               ->  Hash
 (cost=10.50..10.50 rows=1 width=68) (never executed)
                                                                     ->
 Seq Scan on item item_5  (cost=0.00..10.50 rows=1 width=68) (never
executed)

 Filter: (i_category = 'Children'::bpchar)
                                                         ->  Index Scan
using web_sales_pkey on web_sales  (cost=0.15..1.26 rows=1 width=26) (never
execute>
                                                               Index Cond:
(ws_item_sk = item_4.i_item_sk)
                                                   ->  Index Scan using
date_dim_pkey on date_dim date_dim_2  (cost=0.15..0.34 rows=1 width=4)
(never execu>
                                                         Index Cond:
(d_date_sk = web_sales.ws_sold_date_sk)
                                                         Filter: ((d_year =
1999) AND (d_moy = 9))
                                             ->  Index Scan using
customer_address_pkey on customer_address customer_address_2
 (cost=0.14..0.33 rows=1 wid>
                                                   Index Cond:
(ca_address_sk = web_sales.ws_bill_addr_sk)
                                                   Filter: (ca_gmt_offset =
'-6'::numeric)
 Planning Time: 2.630 ms
 Execution Time: 0.330 ms
(82 rows)



On Wed, Nov 27, 2024 at 7:52 PM Andrei Lepikhov <lepi...@gmail.com> wrote:

> On 22/11/2024 18:12, Ba Jinsheng wrote:
> > I think the key difference is that the patch disables the usage of Hash
> > Join, which incurs a worse performance.
> I see here a problem with a number of groups: when predicting it
> incorrectly, Postgres doesn't use the Memoize node. Disabling HashJoin
> puts NestLoop+Memoize at the place of the best path, which is chosen later.
> Unfortunately, we can't see a prediction on the number of groups in
> Memoize and can only guess the issue.
>
> --
> regards, Andrei Lepikhov
>
>
>
>
>
>

-- 
Regards,

--
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: Performance of Query 60 on TPC-DS Benchmark

Reply via email to