Re: [PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)
Hello Michael, A very good Morning to you. On Wed, Nov 20, 2019 at 10:58 PM Michael Kruse wrote: > Am Mi., 20. Nov. 2019 um 10:21 Uhr schrieb HAPPY Mahto > : > >> #pragma clang loop vectorize_assume_alignment(32) > >> for(int i = 0;i < n; i++){ > >> a[i] = b[i] + i*i; > >> } > > > > for this all-access inside the loop will be aligned to 32bit, > > ex IR > >> > >> for.cond: ; preds = %for.inc, > %entry > >> %5 = load i32, i32* %i, align 32, !llvm.access.group !2 > >> %6 = load i32, i32* %n, align 32, !llvm.access.group !2 > >> %cmp = icmp slt i32 %5, %6 > >> br i1 %cmp, label %for.body, label %for.end > >> > >> for.body: ; preds = %for.cond > >> %7 = load i32, i32* %i, align 32, !llvm.access.group !2 > >> %8 = load i32, i32* %i, align 32, !llvm.access.group !2 > >> %idxprom = sext i32 %8 to i64 > >> %arrayidx = getelementptr inbounds i32, i32* %vla1, i64 %idxprom > >> store i32 %7, i32* %arrayidx, align 32, !llvm.access.group !2 > >> br label %for.inc > >> > >> for.inc: ; preds = %for.body > >> %9 = load i32, i32* %i, align 32, !llvm.access.group !2 > >> %inc = add nsw i32 %9, 1 > >> store i32 %inc, i32* %i, align 32, !llvm.access.group !2 > >> br label %for.cond, !llvm.loop !3 > > > > You will not need to create pointers for every array(or operand you want > to perform the operation on). > > IMHO it is better if the programmer has to. It is not always obvious > which arrays are used in the loop. Also, the information can be used > by other optimzations that the vectorizer. > > We wrote this pragma in by keeping in mind that arrays inside the important loops are used for vectorization and it'll be good for them to be in aligned manner. for( int i = 0; i < n; i++){ a[i] = b[i] + c[i]; d[i] = e[i] + i*i; } If there are more than 4-5 arrays inside loop being used then it'll be extra effort to define all of them as _builtin_assume_aligned, we're thinking of letting that thing handled by pragma itself. It'll be very helpful for us if other people from community can give their views on this. > >> > >> void mult(float* x, int size, float factor){ > >> float* ax = (float*)__builtin_assume_aligned(x, 64); > >> for (int i = 0; i < size; ++i) > >> ax[i] *= factor; > >> } > > https://godbolt.org/z/Fd6HMe > > > the alignment is assumed whereas in #pragma it is set to the number > specified. > > Semantically, it is the same. > > I wonder how you expect the assembly output to change? The > __builtin_assume_aligned, will be picked up by the backend and result > in movaps to be used instead of movups. > > > > it'll be easier, and having a pragma for doing this will help as it's > provided in OMP and intel compilers. > > This is a compiler-specific extension. It does not have an influence > on what other compilers do. Even with clang, if you try to do > > #pragma clang loop vectorize_assume_alignment(32) > #pragma omp simd > for (int i = 0; i < size; ++i) > > clang will silently swallow the vectorize_assume_alignment. > > > Michael > ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Re: [PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)
Am Mi., 20. Nov. 2019 um 10:21 Uhr schrieb HAPPY Mahto : >> #pragma clang loop vectorize_assume_alignment(32) >> for(int i = 0;i < n; i++){ >> a[i] = b[i] + i*i; >> } > > for this all-access inside the loop will be aligned to 32bit, > ex IR >> >> for.cond: ; preds = %for.inc, %entry >> %5 = load i32, i32* %i, align 32, !llvm.access.group !2 >> %6 = load i32, i32* %n, align 32, !llvm.access.group !2 >> %cmp = icmp slt i32 %5, %6 >> br i1 %cmp, label %for.body, label %for.end >> >> for.body: ; preds = %for.cond >> %7 = load i32, i32* %i, align 32, !llvm.access.group !2 >> %8 = load i32, i32* %i, align 32, !llvm.access.group !2 >> %idxprom = sext i32 %8 to i64 >> %arrayidx = getelementptr inbounds i32, i32* %vla1, i64 %idxprom >> store i32 %7, i32* %arrayidx, align 32, !llvm.access.group !2 >> br label %for.inc >> >> for.inc: ; preds = %for.body >> %9 = load i32, i32* %i, align 32, !llvm.access.group !2 >> %inc = add nsw i32 %9, 1 >> store i32 %inc, i32* %i, align 32, !llvm.access.group !2 >> br label %for.cond, !llvm.loop !3 > > You will not need to create pointers for every array(or operand you want to > perform the operation on). IMHO it is better if the programmer has to. It is not always obvious which arrays are used in the loop. Also, the information can be used by other optimzations that the vectorizer. >> >> void mult(float* x, int size, float factor){ >> float* ax = (float*)__builtin_assume_aligned(x, 64); >> for (int i = 0; i < size; ++i) >> ax[i] *= factor; >> } https://godbolt.org/z/Fd6HMe > the alignment is assumed whereas in #pragma it is set to the number specified. Semantically, it is the same. I wonder how you expect the assembly output to change? The __builtin_assume_aligned, will be picked up by the backend and result in movaps to be used instead of movups. > it'll be easier, and having a pragma for doing this will help as it's > provided in OMP and intel compilers. This is a compiler-specific extension. It does not have an influence on what other compilers do. Even with clang, if you try to do #pragma clang loop vectorize_assume_alignment(32) #pragma omp simd for (int i = 0; i < size; ++i) clang will silently swallow the vectorize_assume_alignment. Michael ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
Re: [PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)
Hello Michael, Very sorry for the late reply, we had exams and assignments this week and I had to read about _builtin_assume_aligned as I didn't come across this. #pragma clang loop vectorize_assume_alignment(32) > for(int i = 0;i < n; i++){ > a[i] = b[i] + i*i; > } > for this all-access inside the loop will be aligned to 32bit, ex IR > for.cond: ; preds = %for.inc, > %entry > %5 = load i32, i32* %i, align 32, !llvm.access.group !2 > %6 = load i32, i32* %n, align 32, !llvm.access.group !2 > %cmp = icmp slt i32 %5, %6 > br i1 %cmp, label %for.body, label %for.end > > for.body: ; preds = %for.cond > %7 = load i32, i32* %i, align 32, !llvm.access.group !2 > %8 = load i32, i32* %i, align 32, !llvm.access.group !2 > %idxprom = sext i32 %8 to i64 > %arrayidx = getelementptr inbounds i32, i32* %vla1, i64 %idxprom > store i32 %7, i32* %arrayidx, align 32, !llvm.access.group !2 > br label %for.inc > > for.inc: ; preds = %for.body > %9 = load i32, i32* %i, align 32, !llvm.access.group !2 > %inc = add nsw i32 %9, 1 > store i32 %inc, i32* %i, align 32, !llvm.access.group !2 > br label %for.cond, !llvm.loop !3 > You will not need to create pointers for every array(or operand you want to perform the operation on). > void mult(float* x, int size, float factor){ > float* ax = (float*)__builtin_assume_aligned(x, 64); > for (int i = 0; i < size; ++i) > ax[i] *= factor; > } > the IR generated for this : > define void @mult(i32*, i32, float) #0 { > %4 = alloca i32*, align 8 > %5 = alloca i32, align 4 > %6 = alloca float, align 4 > %7 = alloca i32*, align 8 > %8 = alloca i32, align 4 > store i32* %0, i32** %4, align 8 > store i32 %1, i32* %5, align 4 > store float %2, float* %6, align 4 > %9 = load i32*, i32** %4, align 8 > %10 = bitcast i32* %9 to i8* > %11 = ptrtoint i8* %10 to i64 > %12 = and i64 %11, 63 > %13 = icmp eq i64 %12, 0 > call void @llvm.assume(i1 %13) > %14 = bitcast i8* %10 to i32* > store i32* %14, i32** %7, align 8 > store i32 0, i32* %8, align 4 > br label %15 > > ; :15: ; preds = %29, %3 > %16 = load i32, i32* %8, align 4 > %17 = load i32, i32* %5, align 4 > %18 = icmp slt i32 %16, %17 > br i1 %18, label %19, label %32 > > ; :19: ; preds = %15 > %20 = load float, float* %6, align 4 > %21 = load i32*, i32** %7, align 8 > %22 = load i32, i32* %8, align 4 > %23 = sext i32 %22 to i64 > %24 = getelementptr inbounds i32, i32* %21, i64 %23 > %25 = load i32, i32* %24, align 4 > %26 = sitofp i32 %25 to float > %27 = fmul float %26, %20 > %28 = fptosi float %27 to i32 > store i32 %28, i32* %24, align 4 > br label %29 > > ; :29: ; preds = %19 > %30 = load i32, i32* %8, align 4 > %31 = add nsw i32 %30, 1 > store i32 %31, i32* %8, align 4 > br label %15 > > ; :32: ; preds = %15 > ret void > } > the alignment is assumed whereas in #pragma it is set to the number specified. it'll be easier, and having a pragma for doing this will help as it's provided in OMP and intel compilers. Thank you, If I made any mistake please tell me. Happy Mahto CSE Undergrad, IIT Hyderabad On Thu, Nov 14, 2019 at 10:32 PM Michael Kruse via Phabricator < revi...@reviews.llvm.org> wrote: > Meinersbur added a comment. > > Could you elaborate why this is better than `__builtin_assume_aligned`? > > > Repository: > rG LLVM Github Monorepo > > CHANGES SINCE LAST ACTION > https://reviews.llvm.org/D69897/new/ > > https://reviews.llvm.org/D69897 > > > > ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)
Meinersbur added a comment. Could you elaborate why this is better than `__builtin_assume_aligned`? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69897/new/ https://reviews.llvm.org/D69897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)
lebedev.ri added a comment. To reword, if `vectorize_assume_alignment(32)` is *NOT* lowered via `CodeGenFunction::EmitAlignmentAssumption()` in clang frontend, but lowered into `LoopAttributes::VectorizeAssumeAlignment`, then once the alignment that was specified does not match reality, there will be a miscompile instead of an UBSan error. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69897/new/ https://reviews.llvm.org/D69897 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D69897: Add #pragma clang loop vectorize_assume_alignment(n)
m-happy updated this revision to Diff 229190. m-happy added a comment. Updated the syntax as suggested by Michael Kruse, and added functionality to specify the number used for alignment instead of just using 32bits. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69897/new/ https://reviews.llvm.org/D69897 Files: clang/docs/LanguageExtensions.rst clang/include/clang/Basic/Attr.td clang/include/clang/Basic/AttrDocs.td clang/include/clang/Basic/DiagnosticParseKinds.td clang/lib/CodeGen/CGLoopInfo.cpp clang/lib/CodeGen/CGLoopInfo.h clang/lib/Parse/ParsePragma.cpp clang/lib/Sema/SemaStmtAttr.cpp clang/test/AST/ast-print-pragmas.cpp clang/test/CodeGenCXX/pragma-loop-aligned.cpp clang/test/Parser/pragma-loop.cpp clang/test/Parser/pragma-unroll-and-jam.cpp Index: clang/test/Parser/pragma-unroll-and-jam.cpp === --- clang/test/Parser/pragma-unroll-and-jam.cpp +++ clang/test/Parser/pragma-unroll-and-jam.cpp @@ -67,7 +67,7 @@ } // pragma clang unroll_and_jam is disabled for the moment -/* expected-error {{invalid option 'unroll_and_jam'; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, pipeline, pipeline_initiation_interval, vectorize_predicate, or distribute}} */ #pragma clang loop unroll_and_jam(4) +/* expected-error {{invalid option 'unroll_and_jam'; expected vectorize, vectorize_width, interleave, interleave_count, unroll, unroll_count, pipeline, pipeline_initiation_interval, vectorize_predicate, vectorize_assume_alignment, or distribute}} */ #pragma clang loop unroll_and_jam(4) for (int i = 0; i < Length; i++) { for (int j = 0; j < Length; j++) { List[i * Length + j] = Value; Index: clang/test/Parser/pragma-loop.cpp === --- clang/test/Parser/pragma-loop.cpp +++ clang/test/Parser/pragma-loop.cpp @@ -18,42 +18,36 @@ template void test_nontype_template_vectorize(int *List, int Length) { - /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop vectorize_width(V) - for (int i = 0; i < Length; i++) { + /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop vectorize_width(V) for (int i = 0; i < Length; i++) { List[i] = i; } - /* expected-error {{invalid value '0'; must be positive}} */ #pragma clang loop vectorize_width(V / 2) - for (int i = 0; i < Length; i++) { + /* expected-error {{invalid value '0'; must be positive}} */ #pragma clang loop vectorize_width(V / 2) for (int i = 0; i < Length; i++) { List[i] += i; } } template void test_nontype_template_interleave(int *List, int Length) { - /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop interleave_count(I) - for (int i = 0; i < Length; i++) { + /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop interleave_count(I) for (int i = 0; i < Length; i++) { List[i] = i; } - /* expected-error {{invalid value '0'; must be positive}} */ #pragma clang loop interleave_count(2 % I) - for (int i = 0; i < Length; i++) { + /* expected-error {{invalid value '0'; must be positive}} */ #pragma clang loop interleave_count(2 % I) for (int i = 0; i < Length; i++) { List[i] = i; } } template void test_nontype_template_char(int *List, int Length) { - /* expected-error {{invalid argument of type 'char'; expected an integer type}} */ #pragma clang loop vectorize_width(V) - for (int i = 0; i < Length; i++) { + /* expected-error {{invalid argument of type 'char'; expected an integer type}} */ #pragma clang loop vectorize_width(V) for (int i = 0; i < Length; i++) { List[i] = i; } } template void test_nontype_template_bool(int *List, int Length) { - /* expected-error {{invalid argument of type 'bool'; expected an integer type}} */ #pragma clang loop vectorize_width(V) - for (int i = 0; i < Length; i++) { + /* expected-error {{invalid argument of type 'bool'; expected an integer type}} */ #pragma clang loop vectorize_width(V) for (int i = 0; i < Length; i++) { List[i] = i; } } @@ -61,8 +55,8 @@ template void test_nontype_template_badarg(int *List, int Length) { /* expected-error {{use of undeclared identifier 'Vec'}} */ #pragma clang loop vectorize_width(Vec) interleave_count(I) - /* expected-error {{use of undeclared identifier 'Int'}} */ #pragma clang loop vectorize_width(V) interleave_count(Int) - for (int i = 0; i < Length; i++) { + /* expected-error {{use of undeclared identifier 'Int'}} */ #pragma clang loop + vectorize_width(V) interleave_count(Int) for (int i = 0; i < Length; i++) { List[i] = i; } } @@ -70,8 +64,7 @@ template void test_type_template_vectorize(int *List, int Length) { const T Value = -1; - /* expected-error {{invalid value '-1'; must be positive}} */ #pragma clang loop