Re: [HACKERS] GSoC - Idea Discussion

hitesh ramani Fri, 20 Mar 2015 04:27:08 -0700
Hello devs,
Thank you so much for the feedback, to answer to your questions:
Tomas:>So you've created an array of 1M integers, and it's 7x faster on GPU 
>compared to pg_qsort(), correct?
No, I meant general sorting, not on pg_qsort()
>Well, it might surprise you, but PostgreSQL almost never sorts numbers >like 
>this. PostgreSQL sorts tuples, which is way more complicated and, >considering 
>the variable length of tuples (causing issues with memory >access), rather 
>unsuitable for GPU devices. I might be missing >something, of course.>>Also, 
>it often needs additional information, like collations when >sorting by a text 
>field, for example.
I totally agree with you on this point, my current target area is very confined 
as this is the beginning, I'm only considering integer values in one row.
>Why don't you show us the source code? Would be simpler than explaining >what 
>it does.
You can have a look at the code here: 
https://github.com/hiteshramani/Postgres-CUDAThis is a compiled code, you can 
see the call to CUDA function in src/port/qsort.c and .h files - qsort_normal.h 
and qsort_cuda.h. The hello world program is in src/port/qsort_cuda.cu. 
Compilation happens in 2 phases - compile and link, I compiled the cuda file 
with nvcc and for linked I edited the makefile of src/timezone/ because zic 
build needed the linking of the cuda file.
Suggestions are welcome.
>I'd recommend discussing the code here. It's certainly quite complex, 
>>especially if this is your first encounter with it.
Yes, I felt it's a little complex but couldn't find a lot of help resources 
online. I'm looking for help.
>PostgreSQL uses adaptive sort - in-memory when it fits into work_mem, >on-disk 
>when it does not. This is decided at runtime.>>You'll have to do the same 
>thing, because the amount of memory available >on GPUs is limited to a few 
>GBs, and it needs to work for datasets >exceeding that limit (the amount of 
>data is uncertain at planning time).
Yes, I thought of that too. A call could be made with the integer array as an 
input to the GPU. The GPU then returns the result with a sorted array. I want 
to proceed step by step, as there are methods to sort amount which exceed the 
GPU memory.
Álvaro Herrera:I downloaded the zip of the latest custom_join repo I saw 2 days 
ago. I'll check once again. Thank you. :)
KaiGai Kohei:
>Let me say CUDA is better than OpenCL :-)>Because of software quality of 
>OpenCL runtime drivers provided by each vendor,>I've often faced mysterious 
>problems. Only nvidia's runtime are enough reliable>from my point of view. In 
>addition, when we implement using OpenCL is a feature>fully depends on 
>hardware characteristics, so we cannot ignore physical hardware>underlying the 
>abstraction layer.>So, I'm now reworking the code to move CUDA from OpenCL.
That's great, I'd love to help you with that and contribute in it.
>It seems to me you are a little bit optimistic.>Unlike CPU code, GPU-Sorting 
>logic has to reference device memory space,>so all the data to be compared 
>needs to be transferred to GPU devices.>Any pointer on host address space is 
>not valid on GPU calculation.>Amount of device memory is usually smaller than 
>host memory, so your code>needs a capability to combined multiple chunks that 
>is partially sorted...>Probably, it is not all here.
Aren't there algorithms which help you if the device memory is limited and the 
data is massive? I have a rough memory because I did a course online, where I 
saw algorithms to deal with such problems I suppose.
Thanks and Regards,Hitesh Ramani
Re: [HACKERS] GSoC - Idea Discussion

Reply via email to