On 02/20/2013 03:06 PM, Yi Ge wrote:
> I have limited knowledge of LLVM, in the pocl source code, I saw
> itemloop and replication, from parallelism perspective, I don't
> understand the difference.
> Can anybody share some knowledge of this?

Short answer is that the wiloops work-group function generation
method creates simple loop structures to iterate all the work-items
in the kernel (while respecting barrier boundaries). The replication
method does not create loops but replicates the code of the work-items
directly after each other. Thus, a fully unrolled wiloop generated
work-group, in theory, should end up being the same as the replication
method generated one.

However, this is currently not entirely true due to parallelism
metadata differences, and also because the replication method might
preserve more scalar variables in the code as it doesn't have to
scalarize back the work-group context arrays that were generated
during wiloops. Scalar variables can be allocated to registers by
the register allocator to avoid memory accesses.

For work-group vectorization the WIVectorizer (originally BB vectorizer)
works on the fully replicated (unrolled) output and has custom metadata to
aid in the vectorization. The wiloops method is a better match for a loop
vectorizer and I've been working towards utilizing the LLVM's inner loop
vectorizer better for work-group autovectorization.

So, maybe in the future the "replication method" can be replaced with
wiloops that are unrolled as many times as wanted.

-- 
Pekka

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to