[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2020-02-28 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

The last patch from my side just went in (D74162 
: [Inliner] Inlining should honor nobuiltin 
attributes). So I think this is now complete.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2020-02-10 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet abandoned this revision.
gchatelet added a comment.

This has been implemented in the following patches:

- https://reviews.llvm.org/D67923
- https://reviews.llvm.org/D74162
- https://reviews.llvm.org/D73543
- https://reviews.llvm.org/D71710


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2020-01-09 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

Thx for the summary @tejohnson.

In D61634#1808265 , @tejohnson wrote:

> >> 3. Propagate/merge the `no-builtin` IR attribute for LTO by "updating 
> >> `AttributeFuncs::areInlineCompatible` and/or 
> >> `AttributeFuncs::mergeAttributesForInlining` and adding a new MergeRule in 
> >> `include/llvm/IR/Attributes.td` and writing a function like 
> >> `adjustCallerStackProbeSize`."
> > 
> > This one isn't about LTO, but rather the inliner. You could have different 
> > functions in the same module even without LTO that have incompatible 
> > no-builtin attributes. There isn't any propagation required for LTO.
>
> Not done yet - I can work on this.


That would be great!

> 
> 
>> 
>> 
>>> 4. Get inspiration from `TargetTransformInfo` to get `TargetLibraryInfo` on 
>>> a per function basis for all passes and respect the IR attribute.
> 
> Done (D67923  was the last patch in the 
> series to enable this, committed at 878ab6df033d 
> ).
> 
> I'm not quite sure where D71710  
> ([instrinsics] Add @llvm.memcpy.inline instrinsics) fits in to the above list.

I believe it does.

> Anything else missing?

Yes when the intrinsic is in we need a way to access it from C++ so a Clang 
builtin is necessary. I'll take care of it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2020-01-07 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In D61634#1515176 , @tejohnson wrote:

> In D61634#1512020 , @gchatelet wrote:
>
> > AFAIU here is a coarse plan of what needs to happen
>


I've listed below what I believe is the status:

>> 
>> 
>> 1. Add a `no-builtin` clang function attribute that has the same semantic as 
>> the `no-builtin` cmd line argument 
>> 

Done (D68028  committed at bd8791610948).

>> 2. Propagate it to the IR.
>>   - In the light of recent discussions and as @theraven suggested it seems 
>> more appropriate to encode them as individual IR attributes (e.g. 
>> `"no-builtin-memcpy"`, `"no-builtin-sqrt"`, etc..)

Done (also in D68028  committed at 
bd8791610948).
Additionally the -fno-builtin* options were translated to the IR attributes in 
D71193  (committed at 0508c994f0b1 
).

>> 3. Propagate/merge the `no-builtin` IR attribute for LTO by "updating 
>> `AttributeFuncs::areInlineCompatible` and/or 
>> `AttributeFuncs::mergeAttributesForInlining` and adding a new MergeRule in 
>> `include/llvm/IR/Attributes.td` and writing a function like 
>> `adjustCallerStackProbeSize`."
> 
> This one isn't about LTO, but rather the inliner. You could have different 
> functions in the same module even without LTO that have incompatible 
> no-builtin attributes. There isn't any propagation required for LTO.

Not done yet - I can work on this.

> 
> 
>> 4. Get inspiration from `TargetTransformInfo` to get `TargetLibraryInfo` on 
>> a per function basis for all passes and respect the IR attribute.

Done (D67923  was the last patch in the series 
to enable this, committed at 878ab6df033d 
).

I'm not quite sure where D71710  
([instrinsics] Add @llvm.memcpy.inline instrinsics) fits in to the above list.

Anything else missing?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-12-19 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert resigned from this revision.
jdoerfert added a comment.

This has enough active reviewers as it is.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-11-12 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D61634#1732884 , @tejohnson wrote:

> In D61634#1682660 , @gchatelet wrote:
>
> > In D61634#1679331 , @tejohnson 
> > wrote:
> >
> > > In D61634#1635595 , @tejohnson 
> > > wrote:
> > >
> > > > I had some time to work on this finally late last week. I decided the 
> > > > most straightforward thing was to implement the necessary interface 
> > > > changes to the TLI analysis to make it require a Function (without any 
> > > > changes yet to how that analysis operates). See D66428 
> > > >  that I just mailed for review. That 
> > > > takes care of the most widespread changes needed for this migration, 
> > > > and afterwards we can change the analysis to look at the function 
> > > > attributes and make a truly per-function TLI.
> > >
> > >
> > > D66428  went in a few weeks ago at 
> > > r371284, and I just mailed the follow on patch D67923 
> > >  which will adds the support into the 
> > > TLI analysis to use the Function to override the available builtins (with 
> > > some of the code stubbed out since we don't yet have those per-Function 
> > > attributes finalized).
> > >
> > > @gchatelet where are you at on finalizing this patch? Also, I mentioned 
> > > this offline but to follow up here: I think we will want an attribute to 
> > > represent -fno-builtins (so that it doesn't need to be expanded out into 
> > > the full list of individual no-builtin-{func} attributes, which would be 
> > > both more verbose and less efficient, as well as being less backward 
> > > compatible when new builtin funcs are added).
> >
> >
> > I'll break this patch in several pieces. The first one is to add the 
> > `no_builtin` attribute, see https://reviews.llvm.org/D61634.
>
>
> Are you planning to add a follow on patch that translates the various 
> -fno-builtin* options to these attributes? Once that is done I can refine 
> D67923  to actually set the TLI availability 
> from the attributes and remove the clang functionality that does this from 
> the options.


The `no-builtin` attribute is in as of 
rG98f3151a7dded8838fafcb5f46e6c8358def96b8 
. I've 
started working on getting the `-fno-builtin` flag propagate to `no-builtin` 
function attributes. I'll ping back when its ready.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-11-04 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In D61634#1682660 , @gchatelet wrote:

> In D61634#1679331 , @tejohnson wrote:
>
> > In D61634#1635595 , @tejohnson 
> > wrote:
> >
> > > I had some time to work on this finally late last week. I decided the 
> > > most straightforward thing was to implement the necessary interface 
> > > changes to the TLI analysis to make it require a Function (without any 
> > > changes yet to how that analysis operates). See D66428 
> > >  that I just mailed for review. That 
> > > takes care of the most widespread changes needed for this migration, and 
> > > afterwards we can change the analysis to look at the function attributes 
> > > and make a truly per-function TLI.
> >
> >
> > D66428  went in a few weeks ago at 
> > r371284, and I just mailed the follow on patch D67923 
> >  which will adds the support into the TLI 
> > analysis to use the Function to override the available builtins (with some 
> > of the code stubbed out since we don't yet have those per-Function 
> > attributes finalized).
> >
> > @gchatelet where are you at on finalizing this patch? Also, I mentioned 
> > this offline but to follow up here: I think we will want an attribute to 
> > represent -fno-builtins (so that it doesn't need to be expanded out into 
> > the full list of individual no-builtin-{func} attributes, which would be 
> > both more verbose and less efficient, as well as being less backward 
> > compatible when new builtin funcs are added).
>
>
> I'll break this patch in several pieces. The first one is to add the 
> `no_builtin` attribute, see https://reviews.llvm.org/D61634.


Are you planning to add a follow on patch that translates the various 
-fno-builtin* options to these attributes? Once that is done I can refine 
D67923  to actually set the TLI availability 
from the attributes and remove the clang functionality that does this from the 
options.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-09-25 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D61634#1679331 , @tejohnson wrote:

> In D61634#1635595 , @tejohnson wrote:
>
> > I had some time to work on this finally late last week. I decided the most 
> > straightforward thing was to implement the necessary interface changes to 
> > the TLI analysis to make it require a Function (without any changes yet to 
> > how that analysis operates). See D66428  
> > that I just mailed for review. That takes care of the most widespread 
> > changes needed for this migration, and afterwards we can change the 
> > analysis to look at the function attributes and make a truly per-function 
> > TLI.
>
>
> D66428  went in a few weeks ago at r371284, 
> and I just mailed the follow on patch D67923 
>  which will adds the support into the TLI 
> analysis to use the Function to override the available builtins (with some of 
> the code stubbed out since we don't yet have those per-Function attributes 
> finalized).
>
> @gchatelet where are you at on finalizing this patch? Also, I mentioned this 
> offline but to follow up here: I think we will want an attribute to represent 
> -fno-builtins (so that it doesn't need to be expanded out into the full list 
> of individual no-builtin-{func} attributes, which would be both more verbose 
> and less efficient, as well as being less backward compatible when new 
> builtin funcs are added).


I'll break this patch in several pieces. The first one is to add the 
`no_builtin` attribute, see https://reviews.llvm.org/D61634.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-09-23 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In D61634#1635595 , @tejohnson wrote:

> I had some time to work on this finally late last week. I decided the most 
> straightforward thing was to implement the necessary interface changes to the 
> TLI analysis to make it require a Function (without any changes yet to how 
> that analysis operates). See D66428  that I 
> just mailed for review. That takes care of the most widespread changes needed 
> for this migration, and afterwards we can change the analysis to look at the 
> function attributes and make a truly per-function TLI.


D66428  went in a few weeks ago at r371284, 
and I just mailed the follow on patch D67923  
which will adds the support into the TLI analysis to use the Function to 
override the available builtins (with some of the code stubbed out since we 
don't yet have those per-Function attributes finalized).

@gchatelet where are you at on finalizing this patch? Also, I mentioned this 
offline but to follow up here: I think we will want an attribute to represent 
-fno-builtins (so that it doesn't need to be expanded out into the full list of 
individual no-builtin-{func} attributes, which would be both more verbose and 
less efficient, as well as being less backward compatible when new builtin 
funcs are added).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-08-19 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.
Herald added a reviewer: jdoerfert.

I had some time to work on this finally late last week. I decided the most 
straightforward thing was to implement the necessary interface changes to the 
TLI analysis to make it require a Function (without any changes yet to how that 
analysis operates). See D66428  that I just 
mailed for review. That takes care of the most widespread changes needed for 
this migration, and afterwards we can change the analysis to look at the 
function attributes and make a truly per-function TLI.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-07-16 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D61634#1586047 , @tejohnson wrote:

> Checking in to see where we are on this issue. I haven't had any time to work 
> on 4 but hopefully can start on that soon. But it needs the first part done 
> to be effective.


Thx for the heads up @tejohnson.
The patch is missing a bit of documentation but it shouldn't be too far from 
complete:

- it adds a clang function attribute, e.g. 
`__attribute__((no_builtin("memcpy")))`
- instructs clang to compose individual IR attributes from the clang attribute 
above, e.g. `no-builtin-memcpy`, `no-builtin-memset`, `no-builtin-sqrt`...
- adds a specific builtin to clang for the memcpy case `__builtin_memcpy_inline`
- adds an LLVM IR intrinsic `int_memcpy_inline`
- adds an LLVM Instruction `MemCpyInlineInst`
- instructs LLVM to forward the `no-builtin-memcpy` IR attribute from the 
function declaration to the actual memcpy calls inside the function's body 
(same for `memset` and `memmove`)
- adds code to turn the `MemCpyInlineInst` into code, using `DAG.getMemcpy` 
with `always_inline` set.

Left to do:

- finish implementing memset / memmove
- check attributes and reject the one that are not implemented
- add documentation

There's too many things going on in this patch and it may worth splitting it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-07-15 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In D61634#1515176 , @tejohnson wrote:

> In D61634#1512020 , @gchatelet wrote:
>
> > AFAIU here is a coarse plan of what needs to happen
> >
> > 1. Add a `no-builtin` clang function attribute that has the same semantic 
> > as the `no-builtin` cmd line argument 
> > 
> > 2. Propagate it to the IR.
> >   - In the light of recent discussions and as @theraven suggested it seems 
> > more appropriate to encode them as individual IR attributes (e.g. 
> > `"no-builtin-memcpy"`, `"no-builtin-sqrt"`, etc..)
> > 3. Propagate/merge the `no-builtin` IR attribute for LTO by "updating 
> > `AttributeFuncs::areInlineCompatible` and/or 
> > `AttributeFuncs::mergeAttributesForInlining` and adding a new MergeRule in 
> > `include/llvm/IR/Attributes.td` and writing a function like 
> > `adjustCallerStackProbeSize`."
>
>
> This one isn't about LTO, but rather the inliner. You could have different 
> functions in the same module even without LTO that have incompatible 
> no-builtin attributes. There isn't any propagation required for LTO.
>
> > 4. Get inspiration from `TargetTransformInfo` to get `TargetLibraryInfo` on 
> > a per function basis for all passes and respect the IR attribute.
> > 
> >   I'm not familiar with 3 and 4 but I can definitely have a look. I'll 
> > update this patch to do 1 and 2 in the meantime. @tejohnson let me know how 
> > you want to proceed for your related patch 
> > . I'm happy to help if I can.
>
> I will mark that one obsolete. I can work on 4, it may just take some time to 
> get it all plumbed.


Checking in to see where we are on this issue. I haven't had any time to work 
on 4 but hopefully can start on that soon. But it needs the first part done to 
be effective.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-06-06 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet updated this revision to Diff 203386.
gchatelet added a comment.

- Add documentation.
- Fix permissive HasNoRuntimeAttribute
- Mark interleave as disabled in the documentation.
- Use no-builtin instead of no-runtime-for.
- Adding an llvm.memcpy.inline intrinsic.
- Adding __builtin_memcpy_inline clang builtin.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/AttrDocs.td
  clang/include/clang/Basic/Builtins.def
  clang/lib/CodeGen/CGBuilder.h
  clang/lib/CodeGen/CGBuiltin.cpp
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  llvm/docs/LangRef.rst
  llvm/include/llvm/IR/IRBuilder.h
  llvm/include/llvm/IR/IntrinsicInst.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
  llvm/test/CodeGen/X86/memcpy-inline.ll
  llvm/test/CodeGen/X86/memcpy.ll

Index: llvm/test/CodeGen/X86/memcpy.ll
===
--- llvm/test/CodeGen/X86/memcpy.ll
+++ llvm/test/CodeGen/X86/memcpy.ll
@@ -7,7 +7,7 @@
 
 
 ; Variable memcpy's should lower to calls.
-define i8* @test1(i8* %a, i8* %b, i64 %n) nounwind {
+define void @test1(i8* %a, i8* %b, i64 %n) nounwind {
 ; LINUX-LABEL: test1:
 ; LINUX:   # %bb.0: # %entry
 ; LINUX-NEXT:jmp memcpy # TAILCALL
@@ -17,11 +17,11 @@
 ; DARWIN-NEXT:jmp _memcpy ## TAILCALL
 entry:
 	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 )
-	ret i8* %a
+  ret void
 }
 
 ; Variable memcpy's should lower to calls.
-define i8* @test2(i64* %a, i64* %b, i64 %n) nounwind {
+define void @test2(i64* %a, i64* %b, i64 %n) nounwind {
 ; LINUX-LABEL: test2:
 ; LINUX:   # %bb.0: # %entry
 ; LINUX-NEXT:jmp memcpy # TAILCALL
@@ -33,7 +33,25 @@
 	%tmp14 = bitcast i64* %a to i8*
 	%tmp25 = bitcast i64* %b to i8*
 	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp14, i8* align 8 %tmp25, i64 %n, i1 0 )
-	ret i8* %tmp14
+  ret void
+}
+
+; Variable length memcpy's with disabled runtime should lower to repmovsb.
+define void @memcpy_no_runtime(i8* %a, i8* %b, i64 %n) nounwind {
+; LINUX-LABEL: memcpy_no_runtime:
+; LINUX:   # %bb.0: # %entry
+; LINUX-NEXT:movq %rdx, %rcx
+; LINUX-NEXT:rep;movsb (%rsi), %es:(%rdi)
+; LINUX-NEXT:retq
+;
+; DARWIN-LABEL: memcpy_no_runtime:
+; DARWIN:   ## %bb.0: ## %entry
+; DARWIN-NEXT:movq %rdx, %rcx
+; DARWIN-NEXT:rep;movsb (%rsi), %es:(%rdi)
+; DARWIN-NEXT:retq
+entry:
+	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 ) "no-builtin-memcpy"
+  ret void
 }
 
 ; Large constant memcpy's should lower to a call when optimizing for size.
Index: llvm/test/CodeGen/X86/memcpy-inline.ll
===
--- /dev/null
+++ llvm/test/CodeGen/X86/memcpy-inline.ll
@@ -0,0 +1,14 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=core2 | FileCheck %s
+
+declare void @llvm.memcpy.inline.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
+
+define void @test1(i8* %a, i8* %b) nounwind {
+; CHECK-LABEL: test1:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:movq (%rsi), %rax
+; CHECK-NEXT:movq %rax, (%rdi)
+; CHECK-NEXT:retq
+	tail call void @llvm.memcpy.inline.p0i8.p0i8.i64(i8* %a, i8* %b, i64 8, i1 0 )
+  ret void
+}
Index: llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
===
--- llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
+++ llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
@@ -314,5 +314,9 @@
   Size.getValueType(), Align, isVolatile,
   AlwaysInline, DstPtrInfo, SrcPtrInfo);
 
+  /// Handle runtime sizes through repmovsb when we AlwaysInline.
+  if (AlwaysInline)
+return emitRepmovs(Subtarget, DAG, dl, Chain, Dst, Src, Size, MVT::i8);
+
   return SDValue();
 }
Index: llvm/lib/IR/IRBuilder.cpp
===
--- llvm/lib/IR/IRBuilder.cpp
+++ llvm/lib/IR/IRBuilder.cpp
@@ -96,6 +96,14 @@
   return II;
 }
 
+static void ForwardAttribute(const Function *F, StringRef Attribute,
+ CallInst *CI) {
+  if (F->hasFnAttribute(Attribute)) {
+CI->addAttribute(AttributeList::FunctionIndex,
+ F->getFnAttribute(Attribute));
+  }
+}
+
 CallInst *IRBuilderBase::
 CreateMemSet(Value *Ptr, Value *Val, Value *Size, unsigned Align,
  bool isVolatile, MDNode *TBAATag, MDNode *ScopeTag,
@@ -103,7 +111,8 @@
   Ptr = getCastedInt8PtrValue(Ptr);
   Value *Ops[] = {Ptr, Val, Size, getInt1(isVolatile)};
   Type *Tys[] = { Ptr->getType(), Size->getType() };
-  Module *M = BB->getParent()->getParent(

[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-25 Thread Kristina Brooks via Phabricator via cfe-commits
kristina added a comment.

In D61634#1517228 , @xbolva00 wrote:

> I have a question about qsort.. If we provide own implementation of qsort and 
> replace calls to libc's qsort to our qsort, we could fully inline cmp 
> function then. Ideas?


`qsort` would seem out of scope here since this is mostly about `string.h` 
style functions, more specifically `memcpy`/`memmove`. Things like `memset` 
would also fall under this category if covered in subsequent diffs.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-25 Thread Dávid Bolvanský via Phabricator via cfe-commits
xbolva00 added a comment.

I have a question about qsort.. If we provide own implementation of qsort and 
replace calls to libc's qsort to our qsort, we could fully inline cmp function 
then. Ideas?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-23 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In D61634#1512020 , @gchatelet wrote:

> AFAIU here is a coarse plan of what needs to happen
>
> 1. Add a `no-builtin` clang function attribute that has the same semantic as 
> the `no-builtin` cmd line argument 
> 
> 2. Propagate it to the IR.
>   - In the light of recent discussions and as @theraven suggested it seems 
> more appropriate to encode them as individual IR attributes (e.g. 
> `"no-builtin-memcpy"`, `"no-builtin-sqrt"`, etc..)
> 3. Propagate/merge the `no-builtin` IR attribute for LTO by "updating 
> `AttributeFuncs::areInlineCompatible` and/or 
> `AttributeFuncs::mergeAttributesForInlining` and adding a new MergeRule in 
> `include/llvm/IR/Attributes.td` and writing a function like 
> `adjustCallerStackProbeSize`."


This one isn't about LTO, but rather the inliner. You could have different 
functions in the same module even without LTO that have incompatible no-builtin 
attributes. There isn't any propagation required for LTO.

> 4. Get inspiration from `TargetTransformInfo` to get `TargetLibraryInfo` on a 
> per function basis for all passes and respect the IR attribute.
> 
>   I'm not familiar with 3 and 4 but I can definitely have a look. I'll update 
> this patch to do 1 and 2 in the meantime. @tejohnson let me know how you want 
> to proceed for your related patch . I'm 
> happy to help if I can.

I will mark that one obsolete. I can work on 4, it may just take some time to 
get it all plumbed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-23 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet updated this revision to Diff 200998.
gchatelet added a comment.

- Use no-builtin instead of no-runtime-for.
- Use one attribute per runtime function to make merging easier.

The patch is still WIP and needs more work.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/AttrDocs.td
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
  llvm/test/CodeGen/X86/memcpy.ll

Index: llvm/test/CodeGen/X86/memcpy.ll
===
--- llvm/test/CodeGen/X86/memcpy.ll
+++ llvm/test/CodeGen/X86/memcpy.ll
@@ -7,7 +7,7 @@
 
 
 ; Variable memcpy's should lower to calls.
-define i8* @test1(i8* %a, i8* %b, i64 %n) nounwind {
+define void @test1(i8* %a, i8* %b, i64 %n) nounwind {
 ; LINUX-LABEL: test1:
 ; LINUX:   # %bb.0: # %entry
 ; LINUX-NEXT:jmp memcpy # TAILCALL
@@ -17,11 +17,11 @@
 ; DARWIN-NEXT:jmp _memcpy ## TAILCALL
 entry:
 	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 )
-	ret i8* %a
+  ret void
 }
 
 ; Variable memcpy's should lower to calls.
-define i8* @test2(i64* %a, i64* %b, i64 %n) nounwind {
+define void @test2(i64* %a, i64* %b, i64 %n) nounwind {
 ; LINUX-LABEL: test2:
 ; LINUX:   # %bb.0: # %entry
 ; LINUX-NEXT:jmp memcpy # TAILCALL
@@ -33,7 +33,25 @@
 	%tmp14 = bitcast i64* %a to i8*
 	%tmp25 = bitcast i64* %b to i8*
 	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp14, i8* align 8 %tmp25, i64 %n, i1 0 )
-	ret i8* %tmp14
+  ret void
+}
+
+; Variable length memcpy's with disabled runtime should lower to repmovsb.
+define void @memcpy_no_runtime(i8* %a, i8* %b, i64 %n) nounwind {
+; LINUX-LABEL: memcpy_no_runtime:
+; LINUX:   # %bb.0: # %entry
+; LINUX-NEXT:movq %rdx, %rcx
+; LINUX-NEXT:rep;movsb (%rsi), %es:(%rdi)
+; LINUX-NEXT:retq
+;
+; DARWIN-LABEL: memcpy_no_runtime:
+; DARWIN:   ## %bb.0: ## %entry
+; DARWIN-NEXT:movq %rdx, %rcx
+; DARWIN-NEXT:rep;movsb (%rsi), %es:(%rdi)
+; DARWIN-NEXT:retq
+entry:
+	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 ) "no-builtin-memcpy"
+  ret void
 }
 
 ; Large constant memcpy's should lower to a call when optimizing for size.
Index: llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
===
--- llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
+++ llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
@@ -314,5 +314,9 @@
   Size.getValueType(), Align, isVolatile,
   AlwaysInline, DstPtrInfo, SrcPtrInfo);
 
+  /// Handle runtime sizes through repmovsb when we AlwaysInline.
+  if (AlwaysInline)
+return emitRepmovs(Subtarget, DAG, dl, Chain, Dst, Src, Size, MVT::i8);
+
   return SDValue();
 }
Index: llvm/lib/IR/IRBuilder.cpp
===
--- llvm/lib/IR/IRBuilder.cpp
+++ llvm/lib/IR/IRBuilder.cpp
@@ -96,6 +96,14 @@
   return II;
 }
 
+static void ForwardAttribute(const Function *F, StringRef Attribute,
+ CallInst *CI) {
+  if (F->hasFnAttribute(Attribute)) {
+CI->addAttribute(AttributeList::FunctionIndex,
+ F->getFnAttribute(Attribute));
+  }
+}
+
 CallInst *IRBuilderBase::
 CreateMemSet(Value *Ptr, Value *Val, Value *Size, unsigned Align,
  bool isVolatile, MDNode *TBAATag, MDNode *ScopeTag,
@@ -103,7 +111,8 @@
   Ptr = getCastedInt8PtrValue(Ptr);
   Value *Ops[] = {Ptr, Val, Size, getInt1(isVolatile)};
   Type *Tys[] = { Ptr->getType(), Size->getType() };
-  Module *M = BB->getParent()->getParent();
+  Function *F = BB->getParent();
+  Module *M = F->getParent();
   Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memset, Tys);
 
   CallInst *CI = createCallHelper(TheFn, Ops, this);
@@ -121,6 +130,8 @@
   if (NoAliasTag)
 CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);
 
+  ForwardAttribute(F, "no-builtin-memset", CI);
+
   return CI;
 }
 
@@ -165,7 +176,8 @@
 
   Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};
   Type *Tys[] = { Dst->getType(), Src->getType(), Size->getType() };
-  Module *M = BB->getParent()->getParent();
+  Function *F = BB->getParent();
+  Module *M = F->getParent();
   Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memcpy, Tys);
 
   CallInst *CI = createCallHelper(TheFn, Ops, this);
@@ -190,6 +202,8 @@
   if (NoAliasTag)
 CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);
 
+  ForwardAttribute(F, "no-builtin-memcpy", CI);
+
   return CI;
 }
 
@@ -245,7 +259,8 @@
 
   Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};
   Type *Tys[] = { Dst->getType(), Src->getType(), Size->getType() };
-  Module *M = BB->getParent

[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-22 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

AFAIU here is a coarse plan of what needs to happen

1. Add a `no-builtin` clang function attribute that has the same semantic as 
the `no-builtin` cmd line argument 

2. Propagate it to the IR.
  - In the light of recent discussions and as @theraven suggested it seems more 
appropriate to encode them as individual IR attributes (e.g. 
`"no-builtin-memcpy"`, `"no-builtin-sqrt"`, etc..)
3. Propagate/merge the `no-builtin` IR attribute for LTO by "updating 
`AttributeFuncs::areInlineCompatible` and/or 
`AttributeFuncs::mergeAttributesForInlining` and adding a new MergeRule in 
`include/llvm/IR/Attributes.td` and writing a function like 
`adjustCallerStackProbeSize`."
4. Get inspiration from `TargetTransformInfo` to get `TargetLibraryInfo` on a 
per function basis for all passes and respect the IR attribute.

I'm not familiar with 3 and 4 but I can definitely have a look.
I'll update this patch to do 1 and 2 in the meantime. @tejohnson let me know 
how you want to proceed for your related patch 
. I'm happy to help if I can.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-22 Thread Chandler Carruth via Phabricator via cfe-commits
chandlerc added a comment.

Sorry I've been a bit slow to respond here...

In D61634#1503089 , @hfinkel wrote:

> In D61634#1502201 , @tejohnson wrote:
>
> > In D61634#1502138 , @hfinkel wrote:
> >
> > > In D61634#1502043 , @efriedma 
> > > wrote:
> > >
> > > > > I have a related patch that turns -fno-builtin* options into module 
> > > > > flags
> > > >
> > > > Do you have any opinion on representing -fno-builtin using a module 
> > > > flag vs. a function attribute in IR?  It seems generally more flexible 
> > > > and easier to reason about a function attribute from my perspective.  
> > > > But I might be missing something about the semantics of -fno-builtin 
> > > > that would make that representation awkward.  Or I guess it might just 
> > > > be more work to implement, given we have some IPO passes that use 
> > > > TargetLibraryInfo?
> > >
> > >
> > > I think that a function attribute would be better. We generally use these 
> > > flags only in the context of certain translation units, and when we use 
> > > LTO, it would be sad if we had to take the most-conservative settings 
> > > across the entire application. When we insert new function call to a 
> > > standard library, we always do it in the context of some function. We'd 
> > > probably need to block inlining in some cases, but that's better than a 
> > > global conservative setting.
> >
>


FWIW, I definitely agree here. This really is the end state we're going to find 
ourselves in and we should probably go directly there.

>> Using function level attributes instead of module flags does provide finer 
>> grained control and avoids the conservativeness when merging IR for LTO. The 
>> downsides I see, mostly just in terms of the engineering effort to get this 
>> to work, are:
>> 
>> - need to prevent inlining with different attributes
> 
> I think that this should be relatively straightforward. You just need to 
> update `AttributeFuncs::areInlineCompatible` and/or 
> `AttributeFuncs::mergeAttributesForInlining` by adding a new MergeRule in 
> include/llvm/IR/Attributes.td and writing a function like 
> adjustCallerStackProbeSize.

+1

> 
> 
>> - currently the TargetLibraryInfo is constructed on a per-module basis. 
>> Presumably it would instead need to be created per Function - this one in 
>> particular seems like it would require fairly extensive changes.
> 
> Interesting point. The largest issue I see is that we need TLI available from 
> loop passes, etc., and we can't automatically recompute a function-level 
> analysis there. We need to make sure that it's always available and not 
> invalidated. TLI is one of those analysis passes, being derived only from 
> things which don't change (i.e., the target triple), or things that change 
> very rarely (e.g., function attributes), that we probably don't want to 
> require all passes to explicitly say that they preserve it (not that the 
> mechanical change to all existing passes is hard, but it's easy to forget), 
> so I think we'd like something like the CFG-only concept in the current 
> passes, but stronger and perhaps turned on by default, for this kind of 
> "attributes-only" pass. (@chandlerc , thoughts on this?).

Yep, this makes sense.

The new PM makes this quite easy. The analysis itself gets to implement the 
invalidation hook, and say "nope, I'm not invalidated". In fact, in the new PM, 
`TargetLibraryInfo` already works this way. We build an instance per function 
and say it is never invalidated.

However, they are just trivial wrappers around shared implementations, so it 
will still require some non-trivial changes. Will need to remove the 
module-based access and move clients over to provide a function when they query 
it, etc.

IIRC, `TargetTransformInfo` already basically works exactly this way in both 
old and new PMs and we should be able to look at exactly the techniques it uses 
in both pass managers to build an effective way to manage these per-function.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-15 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a subscriber: chandlerc.
hfinkel added a comment.

In D61634#1502201 , @tejohnson wrote:

> In D61634#1502138 , @hfinkel wrote:
>
> > In D61634#1502043 , @efriedma 
> > wrote:
> >
> > > > I have a related patch that turns -fno-builtin* options into module 
> > > > flags
> > >
> > > Do you have any opinion on representing -fno-builtin using a module flag 
> > > vs. a function attribute in IR?  It seems generally more flexible and 
> > > easier to reason about a function attribute from my perspective.  But I 
> > > might be missing something about the semantics of -fno-builtin that would 
> > > make that representation awkward.  Or I guess it might just be more work 
> > > to implement, given we have some IPO passes that use TargetLibraryInfo?
> >
> >
> > I think that a function attribute would be better. We generally use these 
> > flags only in the context of certain translation units, and when we use 
> > LTO, it would be sad if we had to take the most-conservative settings 
> > across the entire application. When we insert new function call to a 
> > standard library, we always do it in the context of some function. We'd 
> > probably need to block inlining in some cases, but that's better than a 
> > global conservative setting.
>
>
> Using function level attributes instead of module flags does provide finer 
> grained control and avoids the conservativeness when merging IR for LTO. The 
> downsides I see, mostly just in terms of the engineering effort to get this 
> to work, are:
>
> - need to prevent inlining with different attributes


I think that this should be relatively straightforward. You just need to update 
`AttributeFuncs::areInlineCompatible` and/or 
`AttributeFuncs::mergeAttributesForInlining` by adding a new MergeRule in 
include/llvm/IR/Attributes.td and writing a function like 
adjustCallerStackProbeSize.

> - currently the TargetLibraryInfo is constructed on a per-module basis. 
> Presumably it would instead need to be created per Function - this one in 
> particular seems like it would require fairly extensive changes.

Interesting point. The largest issue I see is that we need TLI available from 
loop passes, etc., and we can't automatically recompute a function-level 
analysis there. We need to make sure that it's always available and not 
invalidated. TLI is one of those analysis passes, being derived only from 
things which don't change (i.e., the target triple), or things that change very 
rarely (e.g., function attributes), that we probably don't want to require all 
passes to explicitly say that they preserve it (not that the mechanical change 
to all existing passes is hard, but it's easy to forget), so I think we'd like 
something like the CFG-only concept in the current passes, but stronger and 
perhaps turned on by default, for this kind of "attributes-only" pass. 
(@chandlerc , thoughts on this?).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-15 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In D61634#1502685 , @gchatelet wrote:

> In D61634#1502201 , @tejohnson wrote:
>
> > Using function level attributes instead of module flags does provide finer 
> > grained control and avoids the conservativeness when merging IR for LTO. 
> > The downsides I see, mostly just in terms of the engineering effort to get 
> > this to work, are:
> >
> > - need to prevent inlining with different attributes
>
>
> IIUC this is needed regardless of the proposed change. Correct?


With the module flags approach, no - because there won't be fine grained enough 
info to do so. Any merged IR will need to use the conservatively set merged 
flag for the whole module. Or did you mean in comparison with the approach in 
this patch? I haven't looked at this one in any detail yet.

> 
> 
>> - currently the TargetLibraryInfo is constructed on a per-module basis. 
>> Presumably it would instead need to be created per Function - this one in 
>> particular seems like it would require fairly extensive changes.
> 
> Yes this one is a bit worrying.
>  I think we can discard right away any solution that would mutate or create a 
> TLI on a per function basis.
>  Another design would be something like the following:
> 
>   auto FunctionTLI = ModuleTLI.createCustomizedTLI(F);
> 
> 
> where `FunctionTLI` is itself a `TargetLibraryInfo` and calls to 
> `FunctionTLI` would either return the function customizations or delegate to 
> the module's TLI. WDYT?

I don't think this makes it much easier - all TLI users still need to be taught 
to get and use this new Function level TLI. I guess for Function (or lower) 
passes it would be fairly straightforward, but for things like Module level or 
SCC passes it will require more wiring to ensure that the FunctionTLI is 
accessed in the appropriate places. Doable just a lot of manual changes.

> I'm unsure if we want to support function level attribute right away or if 
> it's OK to be in an intermediate state with only module level attributes.

I'm interested in thoughts from other developers here. The module flag change 
is straightforward, but unnecessary churn if we want to go the function 
attribute route. Which despite the work seems like the best long term 
approach...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-15 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D61634#1502201 , @tejohnson wrote:

> Using function level attributes instead of module flags does provide finer 
> grained control and avoids the conservativeness when merging IR for LTO. The 
> downsides I see, mostly just in terms of the engineering effort to get this 
> to work, are:
>
> - need to prevent inlining with different attributes


IIUC this is needed regardless of the proposed change. Correct?

> - currently the TargetLibraryInfo is constructed on a per-module basis. 
> Presumably it would instead need to be created per Function - this one in 
> particular seems like it would require fairly extensive changes.

Yes this one is a bit worrying.
I think we can discard right away any solution that would mutate or creating a 
TLI on a per function basis.
Another design could be the following:

  auto FunctionTLI = ModuleTLI.createCustomizedTLI(F);

`FunctionTLI` would either return the function customizations or delegate to 
the modules TLI. WDYT?

I'm unsure if we want to support function level attribute right away or if it's 
OK to be in an intermediate state with only module level attributes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-14 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In D61634#1502138 , @hfinkel wrote:

> In D61634#1502043 , @efriedma wrote:
>
> > > I have a related patch that turns -fno-builtin* options into module flags
> >
> > Do you have any opinion on representing -fno-builtin using a module flag 
> > vs. a function attribute in IR?  It seems generally more flexible and 
> > easier to reason about a function attribute from my perspective.  But I 
> > might be missing something about the semantics of -fno-builtin that would 
> > make that representation awkward.  Or I guess it might just be more work to 
> > implement, given we have some IPO passes that use TargetLibraryInfo?
>
>
> I think that a function attribute would be better. We generally use these 
> flags only in the context of certain translation units, and when we use LTO, 
> it would be sad if we had to take the most-conservative settings across the 
> entire application. When we insert new function call to a standard library, 
> we always do it in the context of some function. We'd probably need to block 
> inlining in some cases, but that's better than a global conservative setting.


Using function level attributes instead of module flags does provide finer 
grained control and avoids the conservativeness when merging IR for LTO. The 
downsides I see, mostly just in terms of the engineering effort to get this to 
work, are:

- need to prevent inlining with different attributes
- currently the TargetLibraryInfo is constructed on a per-module basis. 
Presumably it would instead need to be created per Function - this one in 
particular seems like it would require fairly extensive changes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-14 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a comment.

In D61634#1502043 , @efriedma wrote:

> > I have a related patch that turns -fno-builtin* options into module flags
>
> Do you have any opinion on representing -fno-builtin using a module flag vs. 
> a function attribute in IR?  It seems generally more flexible and easier to 
> reason about a function attribute from my perspective.  But I might be 
> missing something about the semantics of -fno-builtin that would make that 
> representation awkward.  Or I guess it might just be more work to implement, 
> given we have some IPO passes that use TargetLibraryInfo?


I think that a function attribute would be better. We generally use these flags 
only in the context of certain translation units, and when we use LTO, it would 
be sad if we had to take the most-conservative settings across the entire 
application. When we insert new function call to a standard library, we always 
do it in the context of some function. We'd probably need to block inlining in 
some cases, but that's better than a global conservative setting.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-14 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added a comment.

> I have a related patch that turns -fno-builtin* options into module flags

Do you have any opinion on representing -fno-builtin using a module flag vs. a 
function attribute in IR?  It seems generally more flexible and easier to 
reason about a function attribute from my perspective.  But I might be missing 
something about the semantics of -fno-builtin that would make that 
representation awkward.  Or I guess it might just be more work to implement, 
given we have some IPO passes that use TargetLibraryInfo?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-14 Thread Teresa Johnson via Phabricator via cfe-commits
tejohnson added a comment.

In D61634#1501066 , @gchatelet wrote:

> In D61634#1500453 , @efriedma wrote:
>
> > My main blocker is that I want to make sure we're moving in the right 
> > direction: towards LLVM IR with clear semantics, towards straightforward 
> > rules for writing freestanding C code, and towards solutions which behave 
> > appropriately for all targets.  There's clearly a problem here that's worth 
> > solving, but I want to make sure we're adding small features that interact 
> > with existing features in an obvious way.  The patch as written is adding a 
> > new attribute that changes the semantics of C and LLVM IR in a subtle way 
> > that interacts with existing optimizations and lowering in ways that are 
> > complex and hard to understand.
>
>
> This makes a lot of sense, I'm totally on board to reduce entropy and 
> confusion along the way.
>
> > I don't want to mix general restrictions on calling C library functions, 
> > with restrictions that apply specifically to memcpy: clang generally works 
> > on the assumption that a "memcpy" symbol is available in freestanding 
> > environments, and we don't really want to change that.
> > 
> > With -fno-builtin, specifically, we currently apply the restriction that 
> > optimizations will not introduce memcpy calls that would not exist with 
> > optimization disabled.  This is sort of artificial, and and maybe a bit 
> > confusing, but it seems to work well enough in practice.  gcc does 
> > something similar.
> > 
> > I don't really want to add an attribute that has a different meaning from 
> > -fno-builtin.  An attribute that has the same meaning is a lot less 
> > problematic: it reduces potential confusion, and solves a related problem 
> > that -fno-builtin currently doesn't really work with LTO, because we don't 
> > encode it into the IR.
>
> Adding @tejohnson to the conversation.
>
> IIUC you're offering to introduce something like 
> `__attribute__((no-builtin-memcpy))` or 
> `__attribute__((no-builtin("memcpy")))`.
>  As attributes they would compose nicely with (Thin)LTO.
>
> I believe we still want to turn `-fno-builtin` flags into attributes so 
> there's no impedance mismatch between the flag and the attribute right?


I have a related patch that turns -fno-builtin* options into module flags, and 
uses them in the LTO backends. This addresses current issues with these flags 
not working well with LTO.
See https://reviews.llvm.org/D60162


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-14 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a subscriber: tejohnson.
gchatelet added a comment.

In D61634#1500453 , @efriedma wrote:

> My main blocker is that I want to make sure we're moving in the right 
> direction: towards LLVM IR with clear semantics, towards straightforward 
> rules for writing freestanding C code, and towards solutions which behave 
> appropriately for all targets.  There's clearly a problem here that's worth 
> solving, but I want to make sure we're adding small features that interact 
> with existing features in an obvious way.  The patch as written is adding a 
> new attribute that changes the semantics of C and LLVM IR in a subtle way 
> that interacts with existing optimizations and lowering in ways that are 
> complex and hard to understand.


This makes a lot of sense, I'm totally on board to reduce entropy and confusion 
along the way.

> I don't want to mix general restrictions on calling C library functions, with 
> restrictions that apply specifically to memcpy: clang generally works on the 
> assumption that a "memcpy" symbol is available in freestanding environments, 
> and we don't really want to change that.
> 
> With -fno-builtin, specifically, we currently apply the restriction that 
> optimizations will not introduce memcpy calls that would not exist with 
> optimization disabled.  This is sort of artificial, and and maybe a bit 
> confusing, but it seems to work well enough in practice.  gcc does something 
> similar.
> 
> I don't really want to add an attribute that has a different meaning from 
> -fno-builtin.  An attribute that has the same meaning is a lot less 
> problematic: it reduces potential confusion, and solves a related problem 
> that -fno-builtin currently doesn't really work with LTO, because we don't 
> encode it into the IR.

Adding @tejohnson to the conversation.

IIUC you're offering to introduce something like 
`__attribute__((no-builtin-memcpy))` or `__attribute__((no-builtin("memcpy")))`.
As attributes they would compose nicely with (Thin)LTO.

I believe we still want to turn `-fno-builtin` flags into attributes so there's 
no impedance mismatch between the flag and the attribute right?

> Your current patch is using the "AlwaysInline" flag for 
> SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline 
> implementation.  In general, this flag is only supported for constant-size 
> memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not 
> have some special lowering, like the x86 "rep;movs", you'll hit an assertion 
> failure.  And we don't really want to add an implementation of 
> variable-length memcpy for a lot of targets; it's very inefficient on targets 
> which don't have unaligned memory access.  You could try to narrowly fix this 
> to only apply "AlwaysInline" if the size is a constant integer, but there's a 
> related problem: even if a memcpy is constant size in C, optimizations don't 
> promise to preserve that, in general.  We could theoretically add such a 
> promise, I guess, but that's awkward: it would conditionally apply to 
> llvm.memcpy where the parameter is already const, so it's not something we 
> can easily verify.

Fair enough. This patch was really to get the conversation started : I was 
myself not especially convinced about the approach. Hijacking the 
`AlwaysInline` parameter was a shortcut that would not work for other mem 
function anyways.

> If we added a new builtin `llvm.memcpy.inline`, or something like that, the 
> end result ends up being simpler in many ways. It has obvious rules for both 
> generating it and lowering it, which don't depend on attributes in the parent 
> function.  We could mark the size as "immarg", so we don't have to add 
> special optimization rules for it.   And if we have it, we have a "complete" 
> solution for writing memcpy implementations in C: we make `__builtin_memcpy`, 
> or a new `__builtin_inline_memcpy`, produce it, and it can be combined with 
> -fno-builtin to ensure we don't produce any calls to the "real" memcpy.  The 
> downside of a new builtin, of course, is that we now have two functions with 
> similar semantics, and potentially have to fix a bunch of places to check for 
> both of them.

This was one of the approaches I envisioned, it's much cleaner and it's also a 
lot more work : )
I'm willing to go that route knowing that it would also work for @theraven's 
use case.

> MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops 
> into llvm.memcpy).  If we want it to perform some transform in circumstances 
> where the call "memcpy" isn't available, we have to make sure to restrict it 
> based on the target: in the worst case, on some targets without unaligned 
> memory access, it could just act as a low-quality loop unroller.  This 
> applies no matter how the result is actually represented in IR.

Yes if we have to generate loops it need to happen before SelectionDAG.



In this frame

[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-13 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added a comment.

My main blocker is that I want to make sure we're moving in the right 
direction: towards LLVM IR with clear semantics, towards straightforward rules 
for writing freestanding C code, and towards solutions which behave 
appropriately for all targets.  There's clearly a problem here that's worth 
solving, but I want to make sure we're adding small features that interact with 
existing features in an obvious way.  The patch as written is adding a new 
attribute that changes the semantics of C and LLVM IR in a subtle way that 
interacts with existing optimizations and lowering in ways that are complex and 
hard to understand.

I don't want to mix general restrictions on calling C library functions, with 
restrictions that apply specifically to memcpy: clang generally works on the 
assumption that a "memcpy" symbol is available in freestanding environments, 
and we don't really want to change that.

With -fno-builtin, specifically, we currently apply the restriction that 
optimizations will not introduce memcpy calls that would not exist with 
optimization disabled.  This is sort of artificial, and and maybe a bit 
confusing, but it seems to work well enough in practice.  gcc does something 
similar.

I don't really want to add an attribute that has a different meaning from 
-fno-builtin.  An attribute that has the same meaning is a lot less 
problematic: it reduces potential confusion, and solves a related problem that 
-fno-builtin currently doesn't really work with LTO, because we don't encode it 
into the IR.

Your current patch is using the "AlwaysInline" flag for 
SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline 
implementation.  In general, this flag is only supported for constant-size 
memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not have 
some special lowering, like the x86 "rep;movs", you'll hit an assertion 
failure.  And we don't really want to add an implementation of variable-length 
memcpy for a lot of targets; it's very inefficient on targets which don't have 
unaligned memory access.  You could try to narrowly fix this to only apply 
"AlwaysInline" if the size is a constant integer, but there's a related 
problem: even if a memcpy is constant size in C, optimizations don't promise to 
preserve that, in general.  We could theoretically add such a promise, I guess, 
but that's awkward: it would conditionally apply to llvm.memcpy where the 
parameter is already const, so it's not something we can easily verify.

If we added a new builtin `llvm.memcpy.inline`, or something like that, the end 
result ends up being simpler in many ways. It has obvious rules for both 
generating it and lowering it, which don't depend on attributes in the parent 
function.  We could mark the size as "immarg", so we don't have to add special 
optimization rules for it.   And if we have it, we have a "complete" solution 
for writing memcpy implementations in C: we make `__builtin_memcpy`, or a new 
`__builtin_inline_memcpy`, produce it, and it can be combined with -fno-builtin 
to ensure we don't produce any calls to the "real" memcpy.  The downside of a 
new builtin, of course, is that we now have two functions with similar 
semantics, and potentially have to fix a bunch of places to check for both of 
them.

MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops into 
llvm.memcpy).  If we want it to perform some transform in circumstances where 
the call "memcpy" isn't available, we have to make sure to restrict it based on 
the target: in the worst case, on some targets without unaligned memory access, 
it could just act as a low-quality loop unroller.  This applies no matter how 
the result is actually represented in IR.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-13 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D61634#1498376 , @efriedma wrote:

> I still think there are really two things you're trying to accomplish here, 
> which should be handled separately.
>
> 1. Add a function attribute that works like -fno-builtin-memcpy currently 
> does, to prevent the compiler from synthesizing calls to memcpy.
> 2. Provide a convenient way to write a small, fixed-length "memcpy", in C and 
> in LLVM IR, that is always expanded to optimal straight-line code (without 
> any calls or loops).
>
>   These correspond to proposals (1) and (2) from your RFC; I think we need 
> both to arrive at a reasonable final state.
>
>   (The "rep; movs" is a third thing, which I think should also be handled 
> separately, but it sounds like you don't think that's very important.)


Thank you for taking the time to comment, your feedback is highly appreciated.

I understand that your main concern is about conflating two orthogonal 
requirements (namely 1. and 2.) in a single attribute. Is that correct?
From my point of view, the RFC (and this Patch) really is about 1. - because 2. 
can already be achieved with builtins `__builtin_memcpy(dst, src, )`.

My secondary goals in decreasing priority order are:

- do not change the semantic of current builtins,
- create a relatively straightforward LLVM patch that does not touch too much 
code and that is easy review,
- do not add more confusion around `-fno-builtin-*` meaning,
- do not add more builtins or IR intrinsics.

What is the main blocker on your end? What would you suggest so we can move 
forward?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-10 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added a comment.

> -fno-builtin* is about preventing clang/llvm from recognizing that a piece of 
> code has the same semantic as a particular IR intrinsic, it has nothing to do 
> with preventing the compiler from generating runtime calls.

It has a dual purpose for C library functions.  One, it prevents the compiler 
from assuming an explicitly written call to that function has some particular 
semantics.  Two, it prevents the compiler from assuming the underlying library 
function exists for the purpose of optimization.  (These are sort of 
intertwined, but they both matter in this context.)

> I believe very few people will use the attribute described in the RFC, it 
> will most probably be library maintainers that already know a good deal of 
> how the compiler is allowed to transform the code.

Sure, I'm happy to assume that memcpy/memset implementations are written using 
some appropriate subset of C.  (We should probably document that subset at some 
point.)

--

I still think there are really two things you're trying to accomplish here, 
which should be handled separately.

1. Add a function attribute that works like -fno-builtin-memcpy currently does, 
to prevent the compiler from synthesizing calls to memcpy.
2. Provide a convenient way to write a small, fixed-length "memcpy", in C and 
in LLVM IR, that is always expanded to optimal straight-line code (without any 
calls or loops).

These correspond to proposals (1) and (2) from your RFC; I think we need both 
to arrive at a reasonable final state.

(The "rep; movs" is a third thing, which I think should also be handled 
separately, but it sounds like you don't think that's very important.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-10 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D61634#1494518 , @alexandre.isoard 
wrote:

> I'm not convinced by the approach.
>
> Can it still recognize the loop idiom into memcpy implementation but use 
> `memmove` (as only `memcpy` has been blacklisted)?


Yes it can and it's fine, passing `src` and `dst` arguments as `__restrict` 
will ensure that `memcpy` is the function to choose in this case.
This attribute is not to be used widely but is directed towards library 
maintainer that already know what the compiler is allowed to do.

> Can it do the same for `memmove` (for the case without overlap), and end-up 
> with infinite recursion?

I fail to see how this would happen. Could you craft some pseudo code that 
would exhibit the infinite recursion?
My take on it is that if the compiler can't generate the code it's totally OK 
to fail with an error message.

The purpose of this RFC it to get some help from the compiler to generate the 
best code for some building blocks (say `copy 4 bytes`, `copy 8 bytes`) and 
assemble them in a way that maximizes something (code size, runtime for certain 
parameter distributions).
It would be useful only to libc / libm / compiler-rt implementers to build 
these functions on top of smaller functions that we know the compiler can 
produce at compile time.

I don't think we want to use this attribute to generate fully the `memcpy`. I 
added the expansion into a `rep;movsb` for variable sized memcpy only because 
it's feasible on x86. It's preferable in this case since it has the correct 
semantic and prevents a compilation error but I would be fine with a 
compilation error here.

> I have a feeling we should pick a stance:
> 
> - either not allow the compiler to lower a builtin to a call to the library; 
> (my preferred choice, but I'm biased)

From the point of view of LLVM `@llvm.memcpy` intrinsics has the semantic of 
`memcpy` it does not know if it comes from a lowering in the frontend (e.g. 
passing structures by value) or from a built in. 
I believe this is feasible although I fail to see where my proposal falls 
short. Can you show a code example where the current proposal is problematic?

> - or not allow the library to use compiler builtins (but LTO flow with the 
> runtime library *already* linked smells like trouble if we go this way).

This is currently generating very poor code because `-fno-builtin` prevents 
LLVM from understanding the copy semantic.

> The reason for my bias is that I have a multi-memcpy codegen in the compiler 
> to generate those two calls:
> 
>   memcpy(left,  in, n);
>   memcpy(right, in, n);
> 
> 
> with a single loop.

I don't quite understand how this is linked to the issue at hand. Can you 
provide more context? Pointers to code?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-10 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D61634#1493927 , @efriedma wrote:

> I would be careful about trying to over-generalize here.  There are a few 
> different related bits of functionality which seem to be interesting, given 
> the discussion in the llvm-dev thread, here, and in related patches:


Thx for the feedback @efriedma, I don't fully understand what you're suggesting 
here so I will try to reply inline.

> 1. The ability to specify -fno-builtin* on a per-function level, using 
> function attributes.

`-fno-builtin*` is about preventing clang/llvm from recognizing that a piece of 
code has the same semantic as a particular IR intrinsic, it has nothing to do 
with preventing the compiler from generating runtime calls.

- `fno-builtin` is about transformation from code to IR (frontend)
- The RFC is about the transformation from IR to runtime calls (backend)

> 2. Improved optimization when -fno-builtin-memcpy is specified.

I don't see this happening because if `-fno-builtin-memcpy`is used, clang 
(frontend) might already have unrolled and vectorized the loop, It is then very 
hard - by simply looking at the IR - to recognize that it's a `memcpy` and 
generate good code (e.g. https://godbolt.org/z/JZ-mR0)
Here we really want the compiler to understand that we are copying memory (i.e. 
this is really `@llvm.memcpy` semantic) but we want to prevent it from calling 
the runtime.

> 3. The ability to avoid calls to memcpy for certain C constructs which would 
> naturally be lowered to a memcpy call, like struct assignment of large 
> structs, or explicit calls to __builtin_memcpy().  Maybe also some 
> generalization of this involving other libc/libm/compiler-rt calls.

I believe very few people will use the attribute described in the RFC, it will 
most probably be library maintainers that already know a good deal of how the 
compiler is allowed to transform the code.

> 4. The ability to force the compiler to generate "rep; movs" on x86 without 
> inline asm.

This is not strictly required - at least this is not too useful from the 
purpose of building memcpy functions (more on this a few lines below).

> It's not clear to me that all of this should be tied together.  In 
> particular, I'm not sure -fno-builtin-memcpy should imply the compiler never 
> generates a call to memcpy().

As a matter of fact, those are not tied together. There are different use cases 
with different solutions, the one I'm focusing on here is about preventing the 
compiler from synthesizing runtime calls because we want to be able to 
implement them directly from C / C++.
It is orthogonal to having the compiler recognize a piece of code as an IR 
intrinsic.

> On recent x86 chips, you might be able to get away with unconditionally using 
> "rep movs", but generally an efficient memcpy for more than a few bytes is a 
> lot longer than one instruction, and is not something reasonable for the 
> compiler to synthesize inline.

Well it depends. On Haswell and particularly Skylake it's hard to beat 
rep;movsb for anything bigger than 1k, be it aligned or not.
On other architectures and especially on the ones without ERMSB you have 
different strategies. Actually this is the very goal of this RFC: if you can 
inline or use PGO you can do a much better job for small sizes than calling 
libc's memcpy or inserting `rep;movsb`.

> If we're adding new IR attributes here, we should also consider the 
> interaction with LTO.

Yes this is a very different story, that's why I'm not exploring this route. 
It's rather possible that it would come with a high maintenance cost as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-07 Thread Alexandre Isoard via Phabricator via cfe-commits
alexandre.isoard added a comment.

I'm not convinced by the approach.

Can it still recognize the loop idiom into memcpy implementation but use 
`memmove` (as only `memcpy` has been blacklisted)?
Can it do the same for `memmove` (for the case without overlap), and end-up 
with infinite recursion?

I have a feeling we should pick a stance:

- either not allow the compiler to lower a builtin to a call to the library; 
(my preferred choice, but I'm biased)
- or not allow the library to use compiler builtins (but LTO flow with the 
runtime library *already* linked smells like trouble if we go this way).

The reason for my bias is that I have a multi-memcpy codegen in the compiler to 
generate those two calls:

  memcpy(left,  in, n);
  memcpy(right, in, n);

with a single loop.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-07 Thread Eli Friedman via Phabricator via cfe-commits
efriedma added a comment.

I would be careful about trying to over-generalize here.  There are a few 
different related bits of functionality which seem to be interesting, given the 
discussion in the llvm-dev thread, here, and in related patches:

1. The ability to specify -fno-builtin* on a per-function level, using function 
attributes.
2. Improved optimization when -fno-builtin-memcpy is specified.
3. The ability to avoid calls to memcpy for certain C constructs which would 
naturally be lowered to a memcpy call, like struct assignment of large structs, 
or explicit calls to __builtin_memcpy().  Maybe also some generalization of 
this involving other libc/libm/compiler-rt calls.
4. The ability to force the compiler to generate "rep; movs" on x86 without 
inline asm.

It's not clear to me that all of this should be tied together.  In particular, 
I'm not sure -fno-builtin-memcpy should imply the compiler never generates a 
call to memcpy().  On recent x86 chips, you might be able to get away with 
unconditionally using "rep movs", but generally an efficient memcpy for more 
than a few bytes is a lot longer than one instruction, and is not something 
reasonable for the compiler to synthesize inline.

If we're adding new IR attributes here, we should also consider the interaction 
with LTO.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-07 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet added a comment.

In D61634#1493350 , @theraven wrote:

> I wonder if a list of comma-separated names is the right approach.  Would it 
> make more sense to add a new attribute for each of the helpers, such as 
> `#no-runtime-for-memcpy`? That should make querying simpler (one lookup in 
> the table, rather than a lookup and a string scan) and also make it easier to 
> add and remove attributes (merging is now just a matter of trying to add all 
> of them, with the existing logic to deduplicate repeated attributes working).


So I decided to go that route for two reasons:

- The `no_runtime_for` attribute will be used almost exclusively by runtime 
implementers, on average lookup will return false and the parsing part should 
be marginal (famous last words?)
- We need to support every function in TargetLibraryInfo 

 (I count 434 of them) and I'm not sure adding one attribute per function will 
scale or can stay in sync.

Now I haven't thought about merging indeed, we may be able to reuse or clone 
the approach used for `target-features`?
For instance some backend disable specific functions and we may warn the user 
if it happens. e.g. `setLibcallName(RTLIB::SHL_I128, nullptr);` in 
X86ISelLowering.cpp 


I'm not saying one attribute per helper is not feasible but I'd like to put it 
into perspective with other constraints.

> We probably need to also carefully audit all of the transform passes to find 
> ones that insert calls.  Last time I looked, there were four places in LLVM 
> where memcpy could be expanded, I wonder if there are a similar number where 
> it can be synthesised...

Yes indeed, it's going to be long and painful, here are the functions calling 
`CallLoweringInfo::setLibCallee` :

**llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp**

- ExpandLibCall(LC, Node, isSigned)
- ExpandChainLibCall(LC, Node, isSigned)
- ExpandDivRemLibCall(Node, Results)
- ExpandSinCosLibCall(Node, Results)
- ConvertNodeToLibcall(Node)
- ConvertNodeToLibcall(Node)

**llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp**

- ExpandIntRes_XMULO(N, Lo, Hi)

**llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp**

- ExpandChainLibCall(LC, Node, isSigned)

**llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp**

- getMemcpy(Chain, dl, Dst, Src, Size, Align, isVol, AlwaysInline, isTailCall, 
DstPtrInfo, SrcPtrInfo)
- getAtomicMemcpy(Chain, dl, Dst, DstAlign, Src, SrcAlign, Size, SizeTy, 
ElemSz, isTailCall, DstPtrInfo, SrcPtrInfo)
- getMemmove(Chain, dl, Dst, Src, Size, Align, isVol, isTailCall, DstPtrInfo, 
SrcPtrInfo)
- getAtomicMemmove(Chain, dl, Dst, DstAlign, Src, SrcAlign, Size, SizeTy, 
ElemSz, isTailCall, DstPtrInfo, SrcPtrInfo)
- getMemset(Chain, dl, Dst, Src, Size, Align, isVol, isTailCall, DstPtrInfo)
- getAtomicMemset(Chain, dl, Dst, DstAlign, Value, Size, SizeTy, ElemSz, 
isTailCall, DstPtrInfo)

**llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp**

- visitIntrinsicCall(I, Intrinsic)

**llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp**

- makeLibCall(DAG, LC, RetVT, Ops, isSigned, dl, doesNotReturn, 
isReturnValueUsed, isPostTypeLegalization)
- LowerToTLSEmulatedModel(GA, DAG)

**llvm/lib/Target/AArch64/AArch64ISelLowering.cpp**

- LowerFSINCOS(Op, DAG)

**llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.cpp**

- EmitTargetCodeForMemset(DAG, dl, Chain, Dst, Src, Size, Align, isVolatile, 
DstPtrInfo)

**llvm/lib/Target/ARM/ARMISelLowering.cpp**

- LowerToTLSGeneralDynamicModel(GA, DAG)

**llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp**

- EmitSpecializedLibcall(DAG, dl, Chain, Dst, Src, Size, Align, LC)

**llvm/lib/Target/Hexagon/HexagonSelectionDAGInfo.cpp**

- EmitTargetCodeForMemcpy(DAG, dl, Chain, Dst, Src, Size, Align, isVolatile, 
AlwaysInline, DstPtrInfo, SrcPtrInfo)

**llvm/lib/Target/PowerPC/PPCISelLowering.cpp**

- LowerINIT_TRAMPOLINE(Op, DAG)

**llvm/lib/Target/X86/X86ISelLowering.cpp**

- LowerWin64_i128OP(Op, DAG)
- LowerFSINCOS(Op, Subtarget, DAG)

**llvm/lib/Target/X86/X86SelectionDAGInfo.cpp**

- EmitTargetCodeForMemset(DAG, dl, Chain, Dst, Val, Size, Align, isVolatile, 
DstPtrInfo)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-07 Thread David Chisnall via Phabricator via cfe-commits
theraven added a comment.

I wonder if a list of comma-separated names is the right approach.  Would it 
make more sense to add a new attribute for each of the helpers, such as 
`#no-runtime-for-memcpy`? That should make querying simpler (one lookup in the 
table, rather than a lookup and a string scan) and also make it easier to add 
and remove attributes (merging is now just a matter of trying to add all of 
them, with the existing logic to deduplicate repeated attributes working).

We probably need to also carefully audit all of the transform passes to find 
ones that insert calls.  Last time I looked, there were four places in LLVM 
where memcpy could be expanded, I wonder if there are a similar number where it 
can be synthesised...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-07 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet updated this revision to Diff 198436.
gchatelet added a comment.
Herald added a subscriber: jdoerfert.

- Add documentation.
- Fix permissive HasNoRuntimeAttribute


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634

Files:
  clang/include/clang/Basic/Attr.td
  clang/include/clang/Basic/AttrDocs.td
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
  llvm/test/CodeGen/X86/memcpy.ll

Index: llvm/test/CodeGen/X86/memcpy.ll
===
--- llvm/test/CodeGen/X86/memcpy.ll
+++ llvm/test/CodeGen/X86/memcpy.ll
@@ -7,7 +7,7 @@
 
 
 ; Variable memcpy's should lower to calls.
-define i8* @test1(i8* %a, i8* %b, i64 %n) nounwind {
+define void @test1(i8* %a, i8* %b, i64 %n) nounwind {
 ; LINUX-LABEL: test1:
 ; LINUX:   # %bb.0: # %entry
 ; LINUX-NEXT:jmp memcpy # TAILCALL
@@ -17,11 +17,11 @@
 ; DARWIN-NEXT:jmp _memcpy ## TAILCALL
 entry:
 	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 )
-	ret i8* %a
+  ret void
 }
 
 ; Variable memcpy's should lower to calls.
-define i8* @test2(i64* %a, i64* %b, i64 %n) nounwind {
+define void @test2(i64* %a, i64* %b, i64 %n) nounwind {
 ; LINUX-LABEL: test2:
 ; LINUX:   # %bb.0: # %entry
 ; LINUX-NEXT:jmp memcpy # TAILCALL
@@ -33,7 +33,25 @@
 	%tmp14 = bitcast i64* %a to i8*
 	%tmp25 = bitcast i64* %b to i8*
 	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp14, i8* align 8 %tmp25, i64 %n, i1 0 )
-	ret i8* %tmp14
+  ret void
+}
+
+; Variable length memcpy's with disabled runtime should lower to repmovsb.
+define void @memcpy_no_runtime(i8* %a, i8* %b, i64 %n) nounwind {
+; LINUX-LABEL: memcpy_no_runtime:
+; LINUX:   # %bb.0: # %entry
+; LINUX-NEXT:movq %rdx, %rcx
+; LINUX-NEXT:rep;movsb (%rsi), %es:(%rdi)
+; LINUX-NEXT:retq
+;
+; DARWIN-LABEL: memcpy_no_runtime:
+; DARWIN:   ## %bb.0: ## %entry
+; DARWIN-NEXT:movq %rdx, %rcx
+; DARWIN-NEXT:rep;movsb (%rsi), %es:(%rdi)
+; DARWIN-NEXT:retq
+entry:
+	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 ) "no_runtime_for" = "memcpy"
+  ret void
 }
 
 ; Large constant memcpy's should lower to a call when optimizing for size.
Index: llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
===
--- llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
+++ llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
@@ -314,5 +314,9 @@
   Size.getValueType(), Align, isVolatile,
   AlwaysInline, DstPtrInfo, SrcPtrInfo);
 
+  /// Handle runtime sizes through repmovsb when we AlwaysInline.
+  if (AlwaysInline)
+return emitRepmovs(Subtarget, DAG, dl, Chain, Dst, Src, Size, MVT::i8);
+
   return SDValue();
 }
Index: llvm/lib/IR/IRBuilder.cpp
===
--- llvm/lib/IR/IRBuilder.cpp
+++ llvm/lib/IR/IRBuilder.cpp
@@ -96,6 +96,17 @@
   return II;
 }
 
+static void ForwardNoRuntimeAttribute(const Function *F,
+ StringRef FunctionName,
+ CallInst *CI) {
+  if (F->hasFnAttribute("no_runtime_for")) {
+const Attribute A = F->getFnAttribute("no_runtime_for");
+if (A.getValueAsString().contains(FunctionName)) {
+  CI->addAttribute(AttributeList::FunctionIndex, A);
+}
+  }
+}
+
 CallInst *IRBuilderBase::
 CreateMemSet(Value *Ptr, Value *Val, Value *Size, unsigned Align,
  bool isVolatile, MDNode *TBAATag, MDNode *ScopeTag,
@@ -103,7 +114,8 @@
   Ptr = getCastedInt8PtrValue(Ptr);
   Value *Ops[] = {Ptr, Val, Size, getInt1(isVolatile)};
   Type *Tys[] = { Ptr->getType(), Size->getType() };
-  Module *M = BB->getParent()->getParent();
+  Function *F = BB->getParent();
+  Module *M = F->getParent();
   Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memset, Tys);
 
   CallInst *CI = createCallHelper(TheFn, Ops, this);
@@ -121,6 +133,8 @@
   if (NoAliasTag)
 CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);
 
+  ForwardNoRuntimeAttribute(F, "memset", CI);
+
   return CI;
 }
 
@@ -165,7 +179,8 @@
 
   Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};
   Type *Tys[] = { Dst->getType(), Src->getType(), Size->getType() };
-  Module *M = BB->getParent()->getParent();
+  Function *F = BB->getParent();
+  Module *M = F->getParent();
   Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memcpy, Tys);
 
   CallInst *CI = createCallHelper(TheFn, Ops, this);
@@ -190,6 +205,8 @@
   if (NoAliasTag)
 CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);
 
+  ForwardNoRuntimeAttribute(F, "memcpy", CI);
+
   return CI;
 }
 
@@ -245,7 +262,8 @@
 
   Value *Ops[] = {Dst, Src, Size, getIn

[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

2019-05-07 Thread Guillaume Chatelet via Phabricator via cfe-commits
gchatelet created this revision.
gchatelet added a reviewer: courbet.
Herald added subscribers: llvm-commits, cfe-commits, mgrang, hiraditya.
Herald added projects: clang, LLVM.

POC for the rfc http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D61634

Files:
  clang/include/clang/Basic/Attr.td
  clang/lib/CodeGen/CGCall.cpp
  clang/lib/Sema/SemaDeclAttr.cpp
  llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
  llvm/test/CodeGen/X86/memcpy.ll

Index: llvm/test/CodeGen/X86/memcpy.ll
===
--- llvm/test/CodeGen/X86/memcpy.ll
+++ llvm/test/CodeGen/X86/memcpy.ll
@@ -7,7 +7,7 @@
 
 
 ; Variable memcpy's should lower to calls.
-define i8* @test1(i8* %a, i8* %b, i64 %n) nounwind {
+define void @test1(i8* %a, i8* %b, i64 %n) nounwind {
 ; LINUX-LABEL: test1:
 ; LINUX:   # %bb.0: # %entry
 ; LINUX-NEXT:jmp memcpy # TAILCALL
@@ -17,11 +17,11 @@
 ; DARWIN-NEXT:jmp _memcpy ## TAILCALL
 entry:
 	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 )
-	ret i8* %a
+  ret void
 }
 
 ; Variable memcpy's should lower to calls.
-define i8* @test2(i64* %a, i64* %b, i64 %n) nounwind {
+define void @test2(i64* %a, i64* %b, i64 %n) nounwind {
 ; LINUX-LABEL: test2:
 ; LINUX:   # %bb.0: # %entry
 ; LINUX-NEXT:jmp memcpy # TAILCALL
@@ -33,7 +33,25 @@
 	%tmp14 = bitcast i64* %a to i8*
 	%tmp25 = bitcast i64* %b to i8*
 	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp14, i8* align 8 %tmp25, i64 %n, i1 0 )
-	ret i8* %tmp14
+  ret void
+}
+
+; Variable length memcpy's with disabled runtime should lower to repmovsb.
+define void @memcpy_no_runtime(i8* %a, i8* %b, i64 %n) nounwind {
+; LINUX-LABEL: memcpy_no_runtime:
+; LINUX:   # %bb.0: # %entry
+; LINUX-NEXT:movq %rdx, %rcx
+; LINUX-NEXT:rep;movsb (%rsi), %es:(%rdi)
+; LINUX-NEXT:retq
+;
+; DARWIN-LABEL: memcpy_no_runtime:
+; DARWIN:   ## %bb.0: ## %entry
+; DARWIN-NEXT:movq %rdx, %rcx
+; DARWIN-NEXT:rep;movsb (%rsi), %es:(%rdi)
+; DARWIN-NEXT:retq
+entry:
+	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 ) "no_runtime_for" = "memcpy"
+  ret void
 }
 
 ; Large constant memcpy's should lower to a call when optimizing for size.
Index: llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
===
--- llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
+++ llvm/lib/Target/X86/X86SelectionDAGInfo.cpp
@@ -314,5 +314,9 @@
   Size.getValueType(), Align, isVolatile,
   AlwaysInline, DstPtrInfo, SrcPtrInfo);
 
+  /// Handle runtime sizes through repmovsb when we AlwaysInline.
+  if (AlwaysInline)
+return emitRepmovs(Subtarget, DAG, dl, Chain, Dst, Src, Size, MVT::i8);
+
   return SDValue();
 }
Index: llvm/lib/IR/IRBuilder.cpp
===
--- llvm/lib/IR/IRBuilder.cpp
+++ llvm/lib/IR/IRBuilder.cpp
@@ -96,6 +96,17 @@
   return II;
 }
 
+static void ForwardNoRuntimeAttribute(const Function *F,
+ StringRef FunctionName,
+ CallInst *CI) {
+  if (F->hasFnAttribute("no_runtime_for")) {
+const Attribute A = F->getFnAttribute("no_runtime_for");
+if (A.getValueAsString().contains(FunctionName)) {
+  CI->addAttribute(AttributeList::FunctionIndex, A);
+}
+  }
+}
+
 CallInst *IRBuilderBase::
 CreateMemSet(Value *Ptr, Value *Val, Value *Size, unsigned Align,
  bool isVolatile, MDNode *TBAATag, MDNode *ScopeTag,
@@ -103,7 +114,8 @@
   Ptr = getCastedInt8PtrValue(Ptr);
   Value *Ops[] = {Ptr, Val, Size, getInt1(isVolatile)};
   Type *Tys[] = { Ptr->getType(), Size->getType() };
-  Module *M = BB->getParent()->getParent();
+  Function *F = BB->getParent();
+  Module *M = F->getParent();
   Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memset, Tys);
 
   CallInst *CI = createCallHelper(TheFn, Ops, this);
@@ -121,6 +133,8 @@
   if (NoAliasTag)
 CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);
 
+  ForwardNoRuntimeAttribute(F, "memset", CI);
+
   return CI;
 }
 
@@ -165,7 +179,8 @@
 
   Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};
   Type *Tys[] = { Dst->getType(), Src->getType(), Size->getType() };
-  Module *M = BB->getParent()->getParent();
+  Function *F = BB->getParent();
+  Module *M = F->getParent();
   Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memcpy, Tys);
 
   CallInst *CI = createCallHelper(TheFn, Ops, this);
@@ -190,6 +205,8 @@
   if (NoAliasTag)
 CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);
 
+  ForwardNoRuntimeAttribute(F, "memcpy", CI);
+
   return CI;
 }
 
@@ -245,7 +262,8 @@
 
   Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};
   T