On 02/20/2013 03:06 PM, Yi Ge wrote: > I have limited knowledge of LLVM, in the pocl source code, I saw > itemloop and replication, from parallelism perspective, I don't > understand the difference. > Can anybody share some knowledge of this?
Short answer is that the wiloops work-group function generation method creates simple loop structures to iterate all the work-items in the kernel (while respecting barrier boundaries). The replication method does not create loops but replicates the code of the work-items directly after each other. Thus, a fully unrolled wiloop generated work-group, in theory, should end up being the same as the replication method generated one. However, this is currently not entirely true due to parallelism metadata differences, and also because the replication method might preserve more scalar variables in the code as it doesn't have to scalarize back the work-group context arrays that were generated during wiloops. Scalar variables can be allocated to registers by the register allocator to avoid memory accesses. For work-group vectorization the WIVectorizer (originally BB vectorizer) works on the fully replicated (unrolled) output and has custom metadata to aid in the vectorization. The wiloops method is a better match for a loop vectorizer and I've been working towards utilizing the LLVM's inner loop vectorizer better for work-group autovectorization. So, maybe in the future the "replication method" can be replaced with wiloops that are unrolled as many times as wanted. -- Pekka ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
