> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <[email protected]> wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
>
> I haven't really paid any attention to PGStrom. Perhaps it's just that I
> missed it, but I would find it useful if you could direct me towards a
> benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling compared
> to traditional strategies already implemented in Postgres and other
> systems.
>
Implementation of Hash-Join on GPU side is still under development.
Only available use-case right now is an alternative scan path towards
full table scan in case when a table contains massive amount of records
and qualifiers are enough complicated.
EXPLAIN command below is, a sequential scan towards a table that contains
80M records (all of them are on memory; no disk accesses during execution).
Nvidia's GT640 takes advantages towards single threaded Core i5 4570S, at
least.
postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 +
(y-20.0)^2) < 10;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=10003175757.67..10003175757.68 rows=1 width=0) (actual
time=46648.635..46648.635 rows=1 loops=1)
-> Seq Scan on t1 (cost=10000000000.00..10003109091.00 rows=26666667
width=0) (actual time=0.047..46351.567 rows=2513814 loops=1)
Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) +
((y - 20::double precision) ^ 2::double precision))) < 10::double precision)
Rows Removed by Filter: 77486186
Planning time: 0.066 ms
Total runtime: 46648.668 ms
(6 rows)
postgres=# set pg_strom.enabled = on;
SET
postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 +
(y-20.0)^2) < 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1274424.33..1274424.34 rows=1 width=0) (actual
time=1784.729..1784.729 rows=1 loops=1)
-> Custom (GpuScan) on t1 (cost=10000.00..1207757.67 rows=26666667
width=0) (actual time=179.748..1567.018 rows=2513699 loops=1)
Host References:
Device References: x, y
Device Filter: (sqrt((((x - 20::double precision) ^ 2::double
precision) + ((y - 20::double precision) ^ 2::double precision))) < 10::double
precision)
Total time to load: 0.231 ms
Avg time in send-mq: 0.027 ms
Max time to build kernel: 1.064 ms
Avg time of DMA send: 3.050 ms
Total time of DMA send: 933.318 ms
Avg time of kernel exec: 5.117 ms
Total time of kernel exec: 1565.799 ms
Avg time of DMA recv: 0.086 ms
Total time of DMA recv: 26.289 ms
Avg time in recv-mq: 0.011 ms
Planning time: 0.094 ms
Total runtime: 1784.793 ms
(17 rows)
> If I wanted to make joins faster, personally, I would look at opportunities
> to optimize our existing hash joins to take better advantage of modern CPU
> characteristics. A lot of the research suggests that it may be useful to
> implement techniques that take better advantage of available memory
> bandwidth through techniques like prefetching and partitioning, perhaps
> even (counter-intuitively) at the expense of compute bandwidth. It's
> possible that it just needs to be explained to me, but, with respect,
> intuitively I have a hard time imagining that offloading joins to the GPU
> will help much in the general case. Every paper on joins from the last decade
> talks a lot about memory bandwidth and memory latency. Are you concerned
> with some specific case that I may have missed? In what scenario might a
> cost-based optimizer reasonably prefer a custom join node implemented by
> PgStrom, over any of the existing join node types? It's entirely possible
> that I simply missed relevant discussions here.
>
If our purpose is to consume 100% capacity of GPU device, memory bandwidth
is troublesome. But I'm not interested in GPU benchmarking.
Things I want to do is, accelerate complicated query processing than existing
RDBMS, with cheap in cost and transparent to existing application approach.
Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <[email protected]>
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers