Pekka Jääskeläinen <[email protected]> writes: > Hi Andreas, > > On 05/27/2013 05:02 AM, Andreas Kloeckner wrote: >> - Of the things that aren't implemented, clCreateProgramWithBinary would >> likely be my next most pressing wish. I find the pocl compiler a bit >> slow. (compared to e.g. Intel and AMD) There's nothing wrong with >> that, but developing a program with 30-ish kernels that all need to be >> recompiled for every test run ends up being a pretty miserable >> experience. The idea is that PyOpenCL uses caching with >> clCreateProgramWithBinary and thereby can dramatically cut the wait >> time to rerun stuff you've already run and only incurs compile waits >> for stuff that's new. > > clCreateProgramWithBinary should work already,
It doesn't for me. Bug reported. > , but it doesn't > speed up much because pocl's "binary format" now stores only the initial > program bitcode. Most of the compilation time is spent on parallel region > formation and the code generation which has to be still done from this > bitcode. > > We can improve this by including the final compilation results for all > the devices in the binary format That would be great. > , but I'd rather make the caching completely > transparent so it works like ccache, behind the scenes, improving the > compilation speed even when using the source code method, and keep the binary > API mostly just as an alternative mechanism to provide the kernel input (a > bit more obfuscated than source code). > > Transparent caching can be done by adding hash keys produced from the > sources and the affecting compiler options etc. to the current temp dir > structure. You might have noticed that the current compilation result temp > dir mechanism does cache the results, but only over the program life time > and it has known "false hit" issues: > https://bugs.launchpad.net/pocl/+bug/1179444 PyOpenCL purposefully defeats lower-level caches by adding a random constant variable to the code. I introduced this because Apple OS X also has a lower-level cache with correctness issues... :) Note that if you'd like to do this properly, you'd also have to scan include files for changes. As a result, I'm somewhat indifferent on lower-level caches (as long as they're correct), but I would really like it if clCreateProgramWithBinary included the final compilation result. This is also crucial on large-scale distributed machines that want to avoid spending time on each node compiling source code and instead just want to broadcast binaries. Andreas ------------------------------------------------------------------------------ Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
