
I've performed some tests on pg_strom according to the wiki. But it seems that
queries run slower on GPU than CPU. Can someone shed a light on what's wrong
with my settings. My setup was Quadro K620 + CUDA 7.0 (For Ubuntu 14.10) +
Ubuntu 15.04. And the results was

with pg_strom

explain SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + (y-12.8)^2) < 10;

 Aggregate  (cost=190993.70..190993.71 rows=1 width=0) (actual 
time=18792.236..18792.236 rows=1 loops=1)
   ->  Custom Scan (GpuPreAgg)  (cost=7933.07..184161.18 rows=86 width=108) 
(actual time=4249.656..18792.074 rows=77 loops=1)
         Bulkload: On (density: 100.00%)
         Reduction: NoGroup
         Device Filter: (sqrt((((x - '25.6'::double precision) ^ '2'::double 
precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < 
'10'::double precision)
         ->  Custom Scan (BulkScan) on t0  (cost=6933.07..182660.32 
rows=10000060 width=0) (actual time=139.399..18499.246 rows=10000000 loops=1)
 Planning time: 0.262 ms
 Execution time: 19268.650 ms
(8 rows)

explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;

                                                                    QUERY PLAN
 HashAggregate  (cost=298541.48..298541.81 rows=26 width=12) (actual 
time=11311.568..11311.572 rows=26 loops=1)
   Group Key: t0.cat
   ->  Custom Scan (GpuPreAgg)  (cost=5178.82..250302.07 rows=1088 width=52) 
(actual time=3304.727..11310.021 rows=2307 loops=1)
         Bulkload: On (density: 100.00%)
         Reduction: Local + Global
         ->  Custom Scan (GpuJoin)  (cost=4178.82..248541.18 rows=10000060 
width=12) (actual time=923.417..2661.113 rows=10000000 loops=1)
               Bulkload: On (density: 100.00%)
               Depth 1: Logic: GpuHashJoin, HashKeys: (aid), JoinQual: (aid = 
aid), nrows_ratio: 1.00000000
               ->  Custom Scan (BulkScan) on t0  (cost=0.00..242858.60 
rows=10000060 width=16) (actual time=6.980..871.431 rows=10000000 loops=1)
               ->  Seq Scan on t1  (cost=0.00..734.00 rows=40000 width=4) 
(actual time=0.204..7.309 rows=40000 loops=1)
 Planning time: 47.834 ms
 Execution time: 11355.103 ms
(12 rows)

without pg_strom

test=# explain analyze SELECT count(*) FROM t0 WHERE sqrt((x-25.6)^2 + 
(y-12.8)^2) < 10;
 Aggregate  (cost=426193.03..426193.04 rows=1 width=0) (actual 
time=3880.379..3880.379 rows=1 loops=1)
   ->  Seq Scan on t0  (cost=0.00..417859.65 rows=3333353 width=0) (actual 
time=0.075..3859.200 rows=314063 loops=1)
         Filter: (sqrt((((x - '25.6'::double precision) ^ '2'::double 
precision) + ((y - '12.8'::double precision) ^ '2'::double precision))) < 
'10'::double precision)
         Rows Removed by Filter: 9685937
 Planning time: 0.411 ms
 Execution time: 3880.445 ms
(6 rows)

t=# explain analyze SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 GROUP BY cat;
                                                          QUERY PLAN
 HashAggregate  (cost=431593.73..431594.05 rows=26 width=12) (actual 
time=4960.810..4960.812 rows=26 loops=1)
   Group Key: t0.cat
   ->  Hash Join  (cost=1234.00..381593.43 rows=10000060 width=12) (actual 
time=20.859..3367.510 rows=10000000 loops=1)
         Hash Cond: (t0.aid = t1.aid)
         ->  Seq Scan on t0  (cost=0.00..242858.60 rows=10000060 width=16) 
(actual time=0.021..895.908 rows=10000000 loops=1)
         ->  Hash  (cost=734.00..734.00 rows=40000 width=4) (actual 
time=20.567..20.567 rows=40000 loops=1)
               Buckets: 65536  Batches: 1  Memory Usage: 1919kB
               ->  Seq Scan on t1  (cost=0.00..734.00 rows=40000 width=4) 
(actual time=0.017..11.013 rows=40000 loops=1)
 Planning time: 0.567 ms
 Execution time: 4961.029 ms
(10 rows)

Here is the details how I installed pg_strom,

1. download postgresql 9.5alpha1 and compile it with

    | ./configure --prefix=/export/pg-9.5 --enable-debug --enable-cassert
    | make -j8 all
    | make install

2. install cuda-7.0 (ubuntu 14.10 package from nvidia website)

3. download and compile pg_strom with pg_config in /export/pg-9.5/bin

        | make
        | make install

4. create a db with --no-local

        | initdb --no-local 9.5

5. change postgresql.conf

        | shared_buffers=1GB
        | shared_preload_libraries='pg_strom.so'
        | logging_collector = on
        | log_filename='postgresql-%d.log'
        | pg_strom.enabled=on

6. start postgres

        | pg_ctl -D 9.5 start

   and got the following outputs

        | LOG:  CUDA Runtime version: 7.0.0
        | LOG:  NVIDIA driver version: 346.59
        | LOG:  GPU0 Quadro K620 (384 CUDA cores, 1124MHz), L2 2048KB, RAM 
2047MB (128bits, 900KHz), capability 5.0
        | LOG:  NVRTC - CUDA Runtime Compilation vertion 7.0
        | LOG:  redirecting log output to logging collector process
        | HINT:  Future log output will appear in directory "pg_log".

7. import testdb

        | createdb test
        | psql test < ~/devel/pg_strom/test/testdb.sql
        | psql test -c 'create extension pg_strom'

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to