On 03/19/15 21:41, hitesh ramani wrote:
Hello Tomas, > Could you please elaborate more why to choose CUDA, a nvidia-only > technology, rather than OpenCL, supported by much wider range of > companies and projects? Why do you consider OpenCL unsuitable? > > Not that CUDA is bad - it certainly works better in some scenarios, but > this is a cost/benefits question, and it only works with devices > manufactured by a single company. That significantly limits the > usefulness of the work, IMHO. I will never say OpenCL is unsuitable, I just meant, as per the research I did, CUDA came out with better results. I do agree OpenCL is also a great tool to exploit the power of GPUs. My aim is to enhance the performance using CUDA, though OpenCL implementation might work great too!
My point was that using open standards and frameworks (OpenCL) has much higher chance of being welcomed by the community of open source projects, compared to proprietary technologies like CUDA.
> You mention that you ran into issues with PG Strom. What issues? While I was trying to compile, I ran into the error "src/main.c:27:29: fatal error: utils/ruleutils.h: No such file or directory", when I did make to the branch of Postgres suggested in the description, i.e the custom_join branch, I still ran into the same issue. Moreover, I couldn't locate the file.
That's strange, and you should probably ask people on the PG Strom projects. Haven't tried PG Strom for a long time, but the compilation worked fine some time ago.
> Can we see some examples, what this actually means? What you can and > can't do at this point, etc.? Can you share some numbers how this > improves the performance? I did some benchmarking on quicksort for 1M random numbers(range 0 to 0xffffff) on GPU and CPU, the results showed enhancement of 700% on the GPU.
So you've created an array of 1M integers, and it's 7x faster on GPU compared to pg_qsort(), correct?
Well, it might surprise you, but PostgreSQL almost never sorts numbers like this. PostgreSQL sorts tuples, which is way more complicated and, considering the variable length of tuples (causing issues with memory access), rather unsuitable for GPU devices. I might be missing something, of course.
Also, it often needs additional information, like collations when sorting by a text field, for example.
What this means and what I can do at this point - My aim was to integrate CUDA with Postgres so that I can make a call to the GPU for sorting operation. To start, I made a simple CUDA hello world program, and edited the code to call it from qsort, ran into name mangling issues, so sorted that out by creating 2 different .h files one for CUDA program and for the call I made from qsort. Finally, edited the make file to compile the CUDA program with the Postgres compilation itself and now when I compile my Postgres code, the CUDA file gets compiled too and prints the needed on the server end.
Why don't you show us the source code? Would be simpler than explaining what it does.
What I still haven't done - I still haven't actually enhanced the sorting yet, I'm still analyzing the code, how to tinkle with it, the right approach.
I'd recommend discussing the code here. It's certainly quite complex, especially if this is your first encounter with it.
> That's really difficult to judge, because you have not provided any > source code, examples or anything else to support this. > > > > > Please give in your valuable suggestions and views on this. > > From where I sit, this looks interesting, but rather as a research > project rather than something than can be integrated into PostgreSQL in > a foreseeable future. Not sure that's what GSoC is intended for. > > Also, we badly need more details on this - current status, examples, and > especially project plan explaining the scope. It's impossible to say > whether the sort can be implemented within the GSoC time frame. What I actually see it is as is to be a branch of Postgres which has CUDA compatible features. I wanted to start it by sorting which can
I find it very unlikely that this project will choose something that is intended as a fork.
further be improved. To be honest, I'm still analyzing the sort code for elements above a million integer elements(in a single row, for now) so that the use of GPUs is actually significant. As I saw, Postgres uses external sort for that.
PostgreSQL uses adaptive sort - in-memory when it fits into work_mem, on-disk when it does not. This is decided at runtime.
You'll have to do the same thing, because the amount of memory available on GPUs is limited to a few GBs, and it needs to work for datasets exceeding that limit (the amount of data is uncertain at planning time).
If you feel this isn't feasible in such a time span, I would love to hear any suggestion for any small function which can leverage off by parallelism.
I honestly don't know. -- Tomas Vondra http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers