> On Mon, Aug 17, 2015 at 9:40 AM, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote:
> > I think SortSupport logic provides a reasonable way to solve this
> > kind of problem. For example, btint4sortsupport() informs a function
> > pointer of the fast version of comparator (btint4fastcmp) which takes
> > two Datum argument without indirect memory reference.
> > This mechanism will also make sense for HashAggregate logic, to reduce
> > the cost of function invocations.
> >
> > Please comment on the idea I noticed here.
> 
> It's possible that this can work, but it might be a good idea to run
> 'perf' on this query and find out where the CPU time is actually
> going.  That might give you a clearer picture of why the HashAggregate
> is slow.
>
I tried to run one of CTE portion under the perf enabled.

HashAggregate still takes 490sec in spite of 70sec by underlying Join.


tpcds100=# explain analyze select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_last_name
       ,c_preferred_cust_flag customer_preferred_cust_flag
       ,c_birth_country customer_birth_country
       ,c_login customer_login
       ,c_email_address customer_email_address
       ,d_year dyear
       
,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2)
 year_total
       ,'s' sale_type
 from customer
     ,store_sales
     ,date_dim
 where c_customer_sk = ss_customer_sk
   and ss_sold_date_sk = d_date_sk
 group by c_customer_id
         ,c_first_name
         ,c_last_name
         ,c_preferred_cust_flag
         ,c_birth_country
         ,c_login
         ,c_email_address
         ,d_year
;
                                             QUERY PLAN

---------------------------------------------------------------------------------------------------------------------
HashAggregate  (cost=18194948.40..21477516.00 rows=262605408 width=178)
               (actual time=483480.161..490763.640 rows=9142442 loops=1)
   Group Key: customer.c_customer_id, customer.c_first_name, 
customer.c_last_name, customer.c_preferred_cust_flag,
              customer.c_birth_country, customer.c_login, 
customer.c_email_address, date_dim.d_year
   ->  Custom Scan (GpuJoin)  (cost=101342.54..9660272.64 rows=262605408 
width=178)
                              (actual time=2430.787..73116.553 rows=268562375 
loops=1)
         Bulkload: On (density: 100.00%)
         Depth 1: Logic: GpuHashJoin, HashKeys: (ss_sold_date_sk), JoinQual: 
(ss_sold_date_sk = d_date_sk),
                  nrows (287997024 -> 275041999, 95.50% expected 95.47%)
         Depth 2: Logic: GpuHashJoin, HashKeys: (ss_customer_sk), JoinQual: 
(ss_customer_sk = c_customer_sk),
                  nrows (275041999 -> 268562375, 93.25% expected 91.18%)
         ->  Custom Scan (BulkScan) on store_sales  (cost=0.00..9649559.60 
rows=287996960 width=38)
                                                    (actual 
time=17.141..52757.354 rows=287997024 loops=1)
         ->  Seq Scan on date_dim  (cost=0.00..2705.49 rows=73049 width=16)
                                   (actual time=0.030..20.597 rows=73049 
loops=1)
         ->  Seq Scan on customer  (cost=0.00..87141.74 rows=2000074 width=156)
                                   (actual time=0.010..585.861 rows=2000000 
loops=1)
 Planning time: 1.558 ms
 Execution time: 492113.558 ms
(11 rows)


Perf output is below. Unlike my expectation, the largest portion was consumed
by bpchareq(6.76%) + bcTruelen(8.23%). One other big cluster is, probabaly,
TupleHashTableHash(1.11%) -> slot_getattr(4.29%) -> slot_deform_tuple(4.92%).


# ========
# captured on: Thu Aug 20 09:52:24 2015
# hostname : ayu.kaigai.gr.jp
# os release : 2.6.32-504.23.4.el6.x86_64
# perf version : 2.6.32-504.23.4.el6.x86_64.debug
# arch : x86_64
# nrcpus online : 48
# nrcpus avail : 48
# cpudesc : Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
# cpuid : GenuineIntel,6,63,2
# total memory : 396795400 kB
# cmdline : /usr/bin/perf record -a -e cycles
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, 
excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, 
attr_mmap2 = 0, attr_mmap  = 1, attr_mmap_data = 0
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, tracepoint = 2, software = 1
# ========
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 1558291468259
#
# Overhead          Command               Shared Object                         
                        Symbol
# ........  ...............  ..........................  
.....................................................
#
     8.23%         postgres  postgres                    [.] bcTruelen
     6.76%         postgres  postgres                    [.] bpchareq
     4.92%         postgres  postgres                    [.] pg_detoast_datum
     4.29%         postgres  postgres                    [.] slot_getattr
     4.07%         postgres  postgres                    [.] AllocSetAlloc
     3.58%         postgres  postgres                    [.] slot_deform_tuple
     3.39%         postgres  postgres                    [.] div_var
     3.35%         postgres  postgres                    [.] 
hash_search_with_hash_value
     3.11%         postgres  postgres                    [.] hash_any
     2.62%         postgres  postgres                    [.] make_result
     2.50%         postgres  postgres                    [.] add_abs
     2.24%         postgres  postgres                    [.] ExecAgg
     2.23%         postgres  postgres                    [.] init_var_from_num
     2.09%         postgres  postgres                    [.] 
pg_detoast_datum_packed
     2.07%         postgres  postgres                    [.] 
ExecMakeFunctionResultNoSets
     1.95%         postgres  [vsyscall]                  [.] 0x000000000000014c
     1.88%         postgres  libc-2.12.so                [.] memcpy
     1.83%         postgres  postgres                    [.] execTuplesMatch
     1.71%         postgres  postgres                    [.] sub_abs
     1.70%         postgres  [kernel.kallsyms]           [k] 
copy_user_generic_string
     1.48%         postgres  pg_strom.so                 [.] 
pgstrom_data_store_insert_block
     1.41%         postgres  postgres                    [.] palloc
     1.38%         postgres  postgres                    [.] texteq
     1.29%         postgres  [vdso]                      [.] 0x0000000000000890
     1.11%         postgres  postgres                    [.] TupleHashTableHash
      :
(only larger than 1.0%)


Indeed, 6 of 8 grouping keys in this query uses bpchar() data type, so it is
natural comparison function consumed larger portion of CPU cycles.
Do we have any idea to assist these queries by the backend?


tpcds100=# \d customer
                  Table "public.customer"
         Column         |         Type          | Modifiers
------------------------+-----------------------+-----------
 c_customer_sk          | bigint                | not null
 c_customer_id          | character(16)         | not null
 c_current_cdemo_sk     | bigint                |
 c_current_hdemo_sk     | bigint                |
 c_current_addr_sk      | bigint                |
 c_first_shipto_date_sk | bigint                |
 c_first_sales_date_sk  | bigint                |
 c_salutation           | character(10)         |
 c_first_name           | character(20)         |
 c_last_name            | character(30)         |
 c_preferred_cust_flag  | character(1)          |
 c_birth_day            | bigint                |
 c_birth_month          | bigint                |
 c_birth_year           | bigint                |
 c_birth_country        | character varying(20) |
 c_login                | character(13)         |
 c_email_address        | character(50)         |
 c_last_review_date_sk  | bigint                |


Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to