Pekka Jääskeläinen <[email protected]> writes:

> Hi Andreas,
>
> On 05/27/2013 05:02 AM, Andreas Kloeckner wrote:
>> - Of the things that aren't implemented, clCreateProgramWithBinary would
>>    likely be my next most pressing wish. I find the pocl compiler a bit
>>    slow. (compared to e.g. Intel and AMD) There's nothing wrong with
>>    that, but developing a program with 30-ish kernels that all need to be
>>    recompiled for every test run ends up being a pretty miserable
>>    experience.  The idea is that PyOpenCL uses caching with
>>    clCreateProgramWithBinary and thereby can dramatically cut the wait
>>    time to rerun stuff you've already run and only incurs compile waits
>>    for stuff that's new.
>
> clCreateProgramWithBinary should work already,

It doesn't for me. Bug reported.

> , but it doesn't
> speed up much because pocl's "binary format" now stores only the initial
> program bitcode. Most of the compilation time is spent on parallel region
> formation and the code generation which has to be still done from this
> bitcode.
>
> We can improve this by including the final compilation results for all
> the devices in the binary format

That would be great.

> , but I'd rather make the caching completely
> transparent so it works like ccache, behind the scenes, improving the
> compilation speed even when using the source code method, and keep the binary
> API mostly just as an alternative mechanism to provide the kernel input (a
> bit more obfuscated than source code).
>
> Transparent caching can be done by adding hash keys produced from the
> sources and the affecting compiler options etc. to the current temp dir
> structure. You might have noticed that the current compilation result temp
> dir mechanism does cache the results, but only over the program life time
> and it has known "false hit" issues:
> https://bugs.launchpad.net/pocl/+bug/1179444

PyOpenCL purposefully defeats lower-level caches by adding a random
constant variable to the code. I introduced this because Apple OS X also
has a lower-level cache with correctness issues... :) Note that if you'd
like to do this properly, you'd also have to scan include files for
changes.

As a result, I'm somewhat indifferent on lower-level caches (as long as
they're correct), but I would really like it if
clCreateProgramWithBinary included the final compilation result. This is
also crucial on large-scale distributed machines that want to avoid
spending time on each node compiling source code and instead just want
to broadcast binaries.

Andreas


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to