Re: GPGPUs

luminousone Fri, 16 Aug 2013 17:56:15 -0700

On Friday, 16 August 2013 at 23:30:12 UTC, John Colvin wrote:

On Friday, 16 August 2013 at 22:11:41 UTC, luminousone wrote:
On Friday, 16 August 2013 at 20:07:32 UTC, John Colvin wrote:
We have a[] = b[] * c[] - 5; etc. which could work veryneatly perhaps?
While this in fact could work, given the nature of GPGPU itwould
not be very effective.
In a non shared memory and non cache coherent setup, theentirety
of all 3 arrays have to be copied into GPU memory, had that
statement ran as gpu bytecode, and then copied back to complete
the operation.

GPGPU doesn't make sense on small code blocks, both in
instruction count, and by how memory bound a particularstatement
would be.

The compiler needs to either be explicitly told what can/should
be ran as a GPU function, or have some intelligence about whatto
or not to run as a GPU function.

This will get better in the future, APU's using the full HSA
implementation will drastically reduce the "buyin"latency/cyclecost of using a GPGPU function, and make them more practicalfor
smaller(in instruction count/memory boundess) operations.
I didn't literally mean automatically inserting GPU code.

I was more imagining this:

void foo(T)(T[] arr)
{
    useArray(arr);
}

auto a = someLongArray;
auto b = someOtherLongArray;

gpu
{
    auto aGPU = toGPUMem(a);
    auto bGPU = toGPUMem(b);

    auto c = GPUArr(a.length);

    c[] = a[] * b[];

    auto cCPU = toCPUMem(c);
    c.foo();

    dot(c, iota(c.length).array().toGPUMem())
        .foo();
}

gpu T dot(T)(T[] a, T[] b)
{
    //gpu dot product
}
with cpu arrays and gpu arrays identified separately in thetype system. Automatic conversions could be possible, but ofcourse that would allow carelessness.
Obviously there is some cpu code mixed in with the gpu codethere, which should be executed asynchronously if possible. Youcould also have
onlyGPU
{
    //only code that can all be executed on the GPU.
}
Just ideas off the top of my head. Definitely full of holes andI haven't really considered the detail :)



You can't mix cpu and gpu code, they must be separate.

auto a = someLongArray;
auto b = someOtherLongArray;

auto aGPU = toGPUMem(a);
auto bGPU = toGPUMem(b);

auto c = GPUArr(a.length);

gpu
{
    // this block is one gpu shader program
    c[] = a[] * b[];
}

auto cCPU = toCPUMem(c);
cCPU.foo();
auto cGPU = toGPUMem(cCPU);
auto dGPU = iota(c.length).array().toGPUMem();

gpu{
    // this block is another wholly separate shader program
    auto resultGPU = dot(cGPU, dGPU);
}

auto resultCPU = toCPUMem(resultGPU);
resultCPU.foo();

gpu T dot(T)(T[] a, T[] b)
{
    //gpu dot product
}


Your example rewritten to fit the gpu.

However this still has problems of the cpu having to generate CPUcode from the contents of gpu{} code blocks, as the GPU is unableto allocate memory, so for example ,


gpu{
    auto resultGPU = dot(c, cGPU);
}

likely either won't work, or generates an array allocation in cpucode before the gpu block is otherwise ran.

Also how does that dot product function know the correct indexrange to run on?, are we assuming it knows based on the length ofa?, while the syntax,


c[] = a[] * b[];

is safe for this sort of call, a function is less safe todo thiswith, with function calls the range needs to be told to thefunction, and you would call this function without the gpu{}block as the function itself is marked.

auto resultGPU = dot$(0 ..returnLesser(cGPU.length,dGPU.length))(cGPU, dGPU);

Remember with gpu's you don't send instructions, you send wholeprograms, and the whole program must finish before you can moveonto the next cpu instruction.

Re: GPGPUs

Reply via email to