Re: [HACKERS] [v9.5] Custom Plan API

Kouhei Kaigai Thu, 08 May 2014 18:23:23 -0700

> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <[email protected]> wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
> 
> I haven't really paid any attention to PGStrom. Perhaps it's just that I
> missed it, but I would find it useful if you could direct me towards a
> benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling compared
> to traditional strategies already implemented in Postgres and other
> systems.
> 
Implementation of Hash-Join on GPU side is still under development.


Only available use-case right now is an alternative scan path towards
full table scan in case when a table contains massive amount of records
and qualifiers are enough complicated.

EXPLAIN command below is, a sequential scan towards a table that contains
80M records (all of them are on memory; no disk accesses during execution).
Nvidia's GT640 takes advantages towards single threaded Core i5 4570S, at
least.


postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + 
(y-20.0)^2) < 10;
                                                                        QUERY 
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10003175757.67..10003175757.68 rows=1 width=0) (actual 
time=46648.635..46648.635 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=10000000000.00..10003109091.00 rows=26666667 
width=0) (actual time=0.047..46351.567 rows=2513814 loops=1)
         Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) + 
((y - 20::double precision) ^ 2::double precision))) < 10::double precision)
         Rows Removed by Filter: 77486186
 Planning time: 0.066 ms
 Total runtime: 46648.668 ms
(6 rows)
postgres=# set pg_strom.enabled = on;
SET
postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + 
(y-20.0)^2) < 10;
                                                                           
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1274424.33..1274424.34 rows=1 width=0) (actual 
time=1784.729..1784.729 rows=1 loops=1)
   ->  Custom (GpuScan) on t1  (cost=10000.00..1207757.67 rows=26666667 
width=0) (actual time=179.748..1567.018 rows=2513699 loops=1)
         Host References:
         Device References: x, y
         Device Filter: (sqrt((((x - 20::double precision) ^ 2::double 
precision) + ((y - 20::double precision) ^ 2::double precision))) < 10::double 
precision)
         Total time to load: 0.231 ms
         Avg time in send-mq: 0.027 ms
         Max time to build kernel: 1.064 ms
         Avg time of DMA send: 3.050 ms
         Total time of DMA send: 933.318 ms
         Avg time of kernel exec: 5.117 ms
         Total time of kernel exec: 1565.799 ms
         Avg time of DMA recv: 0.086 ms
         Total time of DMA recv: 26.289 ms
         Avg time in recv-mq: 0.011 ms
 Planning time: 0.094 ms
 Total runtime: 1784.793 ms
(17 rows)


> If I wanted to make joins faster, personally, I would look at opportunities
> to optimize our existing hash joins to take better advantage of modern CPU
> characteristics. A lot of the research suggests that it may be useful to
> implement techniques that take better advantage of available memory
> bandwidth through techniques like prefetching and partitioning, perhaps
> even (counter-intuitively) at the expense of compute bandwidth. It's
> possible that it just needs to be explained to me, but, with respect,
> intuitively I have a hard time imagining that offloading joins to the GPU
> will help much in the general case. Every paper on joins from the last decade
> talks a lot about memory bandwidth and memory latency. Are you concerned
> with some specific case that I may have missed? In what scenario might a
> cost-based optimizer reasonably prefer a custom join node implemented by
> PgStrom, over any of the existing join node types? It's entirely possible
> that I simply missed relevant discussions here.
> 
If our purpose is to consume 100% capacity of GPU device, memory bandwidth
is troublesome. But I'm not interested in GPU benchmarking.
Things I want to do is, accelerate complicated query processing than existing
RDBMS, with cheap in cost and transparent to existing application approach.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <[email protected]>


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

Reply via email to