Hello devs,
Thank you so much for the feedback, to answer to your questions:
Tomas:>So you've created an array of 1M integers, and it's 7x faster on GPU
>compared to pg_qsort(), correct?
No, I meant general sorting, not on pg_qsort()
>Well, it might surprise you, but PostgreSQL almost never sorts numbers >like
>this. PostgreSQL sorts tuples, which is way more complicated and, >considering
>the variable length of tuples (causing issues with memory >access), rather
>unsuitable for GPU devices. I might be missing >something, of course.>>Also,
>it often needs additional information, like collations when >sorting by a text
>field, for example.
I totally agree with you on this point, my current target area is very confined
as this is the beginning, I'm only considering integer values in one row.
>Why don't you show us the source code? Would be simpler than explaining >what
>it does.
You can have a look at the code here:
https://github.com/hiteshramani/Postgres-CUDAThis is a compiled code, you can
see the call to CUDA function in src/port/qsort.c and .h files - qsort_normal.h
and qsort_cuda.h. The hello world program is in src/port/qsort_cuda.cu.
Compilation happens in 2 phases - compile and link, I compiled the cuda file
with nvcc and for linked I edited the makefile of src/timezone/ because zic
build needed the linking of the cuda file.
Suggestions are welcome.
>I'd recommend discussing the code here. It's certainly quite complex,
>>especially if this is your first encounter with it.
Yes, I felt it's a little complex but couldn't find a lot of help resources
online. I'm looking for help.
>PostgreSQL uses adaptive sort - in-memory when it fits into work_mem, >on-disk
>when it does not. This is decided at runtime.>>You'll have to do the same
>thing, because the amount of memory available >on GPUs is limited to a few
>GBs, and it needs to work for datasets >exceeding that limit (the amount of
>data is uncertain at planning time).
Yes, I thought of that too. A call could be made with the integer array as an
input to the GPU. The GPU then returns the result with a sorted array. I want
to proceed step by step, as there are methods to sort amount which exceed the
GPU memory.
Álvaro Herrera:I downloaded the zip of the latest custom_join repo I saw 2 days
ago. I'll check once again. Thank you. :)
KaiGai Kohei:
>Let me say CUDA is better than OpenCL :-)>Because of software quality of
>OpenCL runtime drivers provided by each vendor,>I've often faced mysterious
>problems. Only nvidia's runtime are enough reliable>from my point of view. In
>addition, when we implement using OpenCL is a feature>fully depends on
>hardware characteristics, so we cannot ignore physical hardware>underlying the
>abstraction layer.>So, I'm now reworking the code to move CUDA from OpenCL.
That's great, I'd love to help you with that and contribute in it.
>It seems to me you are a little bit optimistic.>Unlike CPU code, GPU-Sorting
>logic has to reference device memory space,>so all the data to be compared
>needs to be transferred to GPU devices.>Any pointer on host address space is
>not valid on GPU calculation.>Amount of device memory is usually smaller than
>host memory, so your code>needs a capability to combined multiple chunks that
>is partially sorted...>Probably, it is not all here.
Aren't there algorithms which help you if the device memory is limited and the
data is massive? I have a rough memory because I did a course online, where I
saw algorithms to deal with such problems I suppose.
Thanks and Regards,Hitesh Ramani