> > Could you please elaborate more why to choose CUDA, a nvidia-only
> > technology, rather than OpenCL, supported by much wider range of
> > companies and projects? Why do you consider OpenCL unsuitable?
> >
> > Not that CUDA is bad - it certainly works better in some scenarios, but
> > this is a cost/benefits question, and it only works with devices
> > manufactured by a single company. That significantly limits the
> > usefulness of the work, IMHO.
> 
> 
> I will never say OpenCL is unsuitable, I just meant, as per the research I 
> did,
> CUDA came out with better results. I do agree OpenCL is also a great tool to 
> exploit
> the power of GPUs. My aim is to enhance the performance using CUDA, though 
> OpenCL
> implementation might work great too!
> 
Let me say CUDA is better than OpenCL :-)
Because of software quality of OpenCL runtime drivers provided by each vendor,
I've often faced mysterious problems. Only nvidia's runtime are enough reliable
from my point of view. In addition, when we implement using OpenCL is a feature
fully depends on hardware characteristics, so we cannot ignore physical hardware
underlying the abstraction layer.
So, I'm now reworking the code to move CUDA from OpenCL.

> > You mention that you ran into issues with PG Strom. What issues?
> 
> While I was trying to compile, I ran into the error "src/main.c:27:29: fatal 
> error:
> utils/ruleutils.h: No such file or directory", when I did make to the branch 
> of
> Postgres suggested in the description, i.e the custom_join branch, I still ran
> into the same issue. Moreover, I couldn't locate the file.
>
I think you reference the old branch in my personal repository.
Could you confirm the repository URL? Below is the latest.
  https://github.com/pg-strom/devel

> > Can we see some examples, what this actually means? What you can and
> > can't do at this point, etc.? Can you share some numbers how this
> > improves the performance?
> 
> I did some benchmarking on quicksort for 1M random numbers(range 0 to 
> 0xffffff)
> on GPU and CPU, the results showed enhancement of 700% on the GPU.
> 
> What this means and what I can do at this point - My aim was to integrate CUDA
> with Postgres so that I can make a call to the GPU for sorting operation. To 
> start,
> I made a simple CUDA hello world program, and edited the code to call it from
> qsort, ran into name mangling issues, so sorted that out by creating 2 
> different .h
> files one for CUDA program and for the call I made from qsort. Finally, edited
> the make file to compile the CUDA program with the Postgres compilation itself
> and now when I compile my Postgres code, the CUDA file gets compiled too and 
> prints
> the needed on the server end.
> 
> What I still haven't done - I still haven't actually enhanced the sorting yet,
> I'm still analyzing the code, how to tinkle with it, the right approach.
>
It seems to me you are a little bit optimistic.
Unlike CPU code, GPU-Sorting logic has to reference device memory space,
so all the data to be compared needs to be transferred to GPU devices.
Any pointer on host address space is not valid on GPU calculation.
Amount of device memory is usually smaller than host memory, so your code
needs a capability to combined multiple chunks that is partially sorted...
Probably, it is not all here.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to