[PATCH] D108696: [Coroutines] [Frontend] Lookup in std namespace first

2021-11-03 Thread Xun Li via Phabricator via cfe-commits
lxfind accepted this revision.
lxfind added a comment.

I also agree that we should try to keep the compiler simple and not support the 
complicated case.
It should be fairly straightforward for a codebase to update fully to use std 
instead of std::experimental (we have a large coroutine codebase as well). 
Given that everyone is mostly supportive, I will accept the change.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108696/new/

https://reviews.llvm.org/D108696

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108696: [Coroutines] [Frontend] Lookup in std namespace first

2021-09-02 Thread Xun Li via Phabricator via cfe-commits
lxfind accepted this revision.
lxfind added a comment.
This revision is now accepted and ready to land.

LGTM. Thanks!


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108696/new/

https://reviews.llvm.org/D108696

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D108615: [Coroutines] [libcxx] Move coroutine component out of experimental namespace

2021-08-24 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

I am not familiar with the process of when to move something out of 
experimental, but I do wonder how this is normally done so that people who uses 
coroutines can have a smooth migration?
I assume that this is going to be a breaking change that existing code using 
coroutine will need to be updated and no longer compatible with old versions.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108615/new/

https://reviews.llvm.org/D108615

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105877: [Coroutines] Run coroutine passes by default

2021-08-03 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D105877#2923257 , @nikic wrote:

> I noticed that this change had a measurable impact on `O0` memory usage, 
> which I wouldn't have expected 
> (https://llvm-compile-time-tracker.com/compare.php?from=0f9e6451a836886f39137818c4f0cfd69ae31e62=8a1727ba51d262365b0d9fe10fef7e50da7022cd=max-rss).
>  Any idea what could cause it? Some additional analysis results hanging 
> around?

That's surprising. Is there a way to measure these benchmarks locally? We could 
probably find out which one is causing the issue by manually commenting out 
each coro pass and see how the number changes.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105877/new/

https://reviews.llvm.org/D105877

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D43477: [CFG] [analyzer] Add MaterializeTemporaryExpr into the construction context.

2021-08-02 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.
Herald added subscribers: manas, steakhal, ASDenysPetrov, martong, dkrupp, 
donat.nagy, Szelethus, mikhail.ramalho, baloghadamsoftware.

Hi! I have a question regarding the implementation of 
"VisitMaterializeTemporaryExpr". Specifically, I wonder if we should skip 
visiting the children? Would't visiting the children of 
MaterializeTemporaryExpr cause the same expression to be visited twice?

I am debugging a crash in ThreadSafetyAnalyzer, which triggers this assertion: 
https://github.com/llvm/llvm-project/blob/main/clang/lib/Analysis/ThreadSafety.cpp#L534
Basically it's adding the same declaration twice from the same CFG block.
And I found that the redundant declaration is added during CFG construction, 
when processing cpp source code like this:

  co_return ({ static constexpr mydomain::logdevice::ErrorStacktrace::Frame 
frame{ __FUNCTION__, "logdevice/common/ZookeeperClientBase.cpp", 103}; 
mydomain::logdevice::detail::makeUnexpected(, toStatus(result.rc_)); });

which corresponds to the following AST:

  |   `-CompoundStmt 0x227b13f8
  | `-CoreturnStmt 0x227b13d0
  |   |-CXXBindTemporaryExpr 0x22776218 'folly::Unexpected':'class 
folly::Unexpected' (CXXTemporary 0x22776218)
  |   | `-StmtExpr 0x227761f0 'folly::Unexpected':'class 
folly::Unexpected'
  |   |   `-CompoundStmt 0x227761d0
  |   | |-DeclStmt 0x22771e88
  |   | | `-VarDecl 0x22771c50  used frame 'const 
mydomain::logdevice::class ErrorStacktrace::Frame':'const struct 
mydomain::logdevice::ErrorStacktrace::Frame' static constexpr listinit
  |   | |   |-value: Struct
  |   | |   | `-fields: LValue , LValue , Int 103
  |   | |   `-InitListExpr 0x22771da8 'const mydomain::logdevice::class 
ErrorStacktrace::Frame':'const struct 
mydomain::logdevice::ErrorStacktrace::Frame'
  |   | | |-ImplicitCastExpr 0x22771e00 'const char *' 

  |   | | | `-PredefinedExpr 0x22771cd8 'const char [8]' lvalue 
__FUNCTION__
  |   | | |   `-StringLiteral 0x22771cb8 'const char [8]' lvalue 
"getData"
  |   | | |-ImplicitCastExpr 0x22771e18 'const char *' 

  |   | | | `-StringLiteral 0x22771cf0 'const char [41]' lvalue 
"logdevice/common/ZookeeperClientBase.cpp"
  |   | | `-IntegerLiteral 0x22771d30 'int' 103
  |   | `-ExprWithCleanups 0x227761b8 'folly::Unexpected':'class 
folly::Unexpected'
  |   |   `-CXXBindTemporaryExpr 0x22776198 
'folly::Unexpected':'class folly::Unexpected' (CXXTemporary 0x22776198)
  |   | `-CallExpr 0x22776160 'folly::Unexpected':'class 
folly::Unexpected'
  |   |   |-ImplicitCastExpr 0x22776148 'folly::Unexpected 
(*)(const class ErrorStacktrace::Frame *, mydomain::logdevice::Status)' 

  |   |   | `-DeclRefExpr 0x227760b8 'folly::Unexpected 
(const class ErrorStacktrace::Frame *, mydomain::logdevice::Status)' lvalue 
Function 0x18b49568 'makeUnexpected' 'folly::Unexpected (const class 
ErrorStacktrace::Frame *, mydomain::logdevice::Status)'
  |   |   |-UnaryOperator 0x22771fb8 'const 
mydomain::logdevice::class ErrorStacktrace::Frame *' prefix '&' cannot overflow
  |   |   | `-DeclRefExpr 0x22771f68 'const 
mydomain::logdevice::class ErrorStacktrace::Frame':'const struct 
mydomain::logdevice::ErrorStacktrace::Frame' lvalue Var 0x22771c50 'frame' 
'const mydomain::logdevice::class ErrorStacktrace::Frame':'const struct 
mydomain::logdevice::ErrorStacktrace::Frame'
  |   |   `-CallExpr 0x227720e0 'mydomain::logdevice::Status':'enum 
mydomain::logdevice::E'
  |   | |-ImplicitCastExpr 0x227720c8 
'mydomain::logdevice::Status (*)(int)' 
  |   | | `-DeclRefExpr 0x22771ff8 'mydomain::logdevice::Status 
(int)' lvalue CXXMethod 0x221cddd0 'toStatus' 'mydomain::logdevice::Status 
(int)'
  |   | `-ImplicitCastExpr 0x22772108 'int' 
  |   |   `-MemberExpr 0x22772038 'int' lvalue .rc_ 0x21f24480
  |   | `-DeclRefExpr 0x22772018 'struct 
mydomain::logdevice::zk::GetResponse':'struct 
mydomain::logdevice::zk::GetResponse' lvalue Var 0x22731508 'result' 'struct 
mydomain::logdevice::zk::GetResponse':'struct 
mydomain::logdevice::zk::GetResponse'
  |   `-ExprWithCleanups 0x227b13b8 'void'
  | `-CXXMemberCallExpr 0x227b1378 'void'
  |   |-MemberExpr 0x227b1330 '' 
.return_value 0x227b1230
  |   | `-DeclRefExpr 0x22776238 'std::__coroutine_traits_impl > 
>::promise_type':'class folly::coro::detail::TaskPromise >' lvalue Var 0x227330a0 '__promise' 
'std::__coroutine_traits_impl > >::promise_type':'class 
folly::coro::detail::TaskPromise >'
  |   `-MaterializeTemporaryExpr 0x227b13a0 
'folly::Unexpected':'class folly::Unexpected' xvalue
  | `-CXXBindTemporaryExpr 0x22776218 
'folly::Unexpected':'class folly::Unexpected' (CXXTemporary 0x22776218)
  |   `-StmtExpr 0x227761f0 

[PATCH] D107155: [clang][deps] Substitute clang-scan-deps executable in lit tests

2021-07-30 Thread Xun Li via Phabricator via cfe-commits
lxfind accepted this revision.
lxfind added a comment.
This revision is now accepted and ready to land.

Thank you!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107155/new/

https://reviews.llvm.org/D107155

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95807: [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-30 Thread Xun Li via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG822b92aae439: [Coroutines] Add the newly generated SCCs back 
to the CGSCC work queue after… (authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

Files:
  clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/test/Transforms/Coroutines/ArgAddr.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O2.ll
  llvm/test/Transforms/Coroutines/coro-alloca-01.ll
  llvm/test/Transforms/Coroutines/coro-alloca-02.ll
  llvm/test/Transforms/Coroutines/coro-alloca-03.ll
  llvm/test/Transforms/Coroutines/coro-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-alloca-05.ll
  llvm/test/Transforms/Coroutines/coro-alloca-06.ll
  llvm/test/Transforms/Coroutines/coro-alloca-07.ll
  llvm/test/Transforms/Coroutines/coro-alloca-08.ll
  llvm/test/Transforms/Coroutines/coro-async.ll
  llvm/test/Transforms/Coroutines/coro-byval-param.ll
  llvm/test/Transforms/Coroutines/coro-catchswitch-cleanuppad.ll
  llvm/test/Transforms/Coroutines/coro-catchswitch.ll
  llvm/test/Transforms/Coroutines/coro-debug.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-00.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-01.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-02.ll
  llvm/test/Transforms/Coroutines/coro-frame-arrayalloca.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-00.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-01.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-02.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-03.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
  llvm/test/Transforms/Coroutines/coro-frame-unreachable.ll
  llvm/test/Transforms/Coroutines/coro-frame.ll
  llvm/test/Transforms/Coroutines/coro-materialize.ll
  llvm/test/Transforms/Coroutines/coro-padding.ll
  llvm/test/Transforms/Coroutines/coro-param-copy.ll
  llvm/test/Transforms/Coroutines/coro-retcon-alloca.ll
  llvm/test/Transforms/Coroutines/coro-retcon-frame.ll
  llvm/test/Transforms/Coroutines/coro-retcon-once-value.ll
  llvm/test/Transforms/Coroutines/coro-retcon-once-value2.ll
  llvm/test/Transforms/Coroutines/coro-retcon-resume-values.ll
  llvm/test/Transforms/Coroutines/coro-retcon-resume-values2.ll
  llvm/test/Transforms/Coroutines/coro-retcon-unreachable.ll
  llvm/test/Transforms/Coroutines/coro-retcon-value.ll
  llvm/test/Transforms/Coroutines/coro-retcon.ll
  llvm/test/Transforms/Coroutines/coro-spill-after-phi.ll
  llvm/test/Transforms/Coroutines/coro-spill-corobegin.ll
  llvm/test/Transforms/Coroutines/coro-spill-defs-before-corobegin.ll
  llvm/test/Transforms/Coroutines/coro-spill-promise.ll
  llvm/test/Transforms/Coroutines/coro-split-00.ll
  llvm/test/Transforms/Coroutines/coro-split-02.ll
  llvm/test/Transforms/Coroutines/coro-split-alloc.ll
  llvm/test/Transforms/Coroutines/coro-split-dbg.ll
  llvm/test/Transforms/Coroutines/coro-split-eh-00.ll
  llvm/test/Transforms/Coroutines/coro-split-eh-01.ll
  llvm/test/Transforms/Coroutines/coro-split-hidden.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail1.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail2.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail3.ll
  llvm/test/Transforms/Coroutines/coro-split-recursive.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-03.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-04.ll
  llvm/test/Transforms/Coroutines/coro-swifterror.ll
  llvm/test/Transforms/Coroutines/coro-zero-alloca.ll
  llvm/test/Transforms/Coroutines/no-suspend.ll
  llvm/test/Transforms/Coroutines/restart-trigger.ll
  llvm/test/Transforms/Coroutines/smoketest.ll

Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -10,12 +10,16 @@
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
 ; RUN: -debug-pass-manager 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -debug-pass-manager \
-; RUN: -passes='function(coro-early),cgscc(coro-split),function(coro-elide,coro-cleanup)' 2>&1 \
+; RUN: -passes='function(coro-early),function(coro-elide),cgscc(coro-split),function(coro-cleanup)' 2>&1 \
 ; RUN: | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 
+; note that we 

[PATCH] D95807: [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-30 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2001
 
-if ((Shape.ABI == coro::ABI::Async || Shape.ABI == coro::ABI::Retcon ||
- Shape.ABI == coro::ABI::RetconOnce) &&
-!Shape.CoroSuspends.empty()) {
-  // Run the CGSCC pipeline on the newly split functions.
-  // All clones will be in the same RefSCC, so choose a random clone.
-  UR.RCWorklist.insert(CG.lookupRefSCC(CG.get(*Clones[0])));
+if (!Shape.CoroSuspends.empty()) {
+  // Run the CGSCC pipeline on the original and newly split functions.

ychen wrote:
> aeubanks wrote:
> > ChuanqiXu wrote:
> > > I am not familiar with the Shape.ABI other than coro::ABI:switch. But the 
> > > diff line seems strange, it looks like that condition gets weaker.
> > I believe that's intentional, and a big part of this patch. We want to 
> > re-add the current SCC (and the split SCCs) any time we split an SCC. 
> > Before we weren't properly doing that.
> I got your point. So "// All clones will be in the same RefSCC " : this 
> is not accurate I think?
Note that previously this is done only for Async, Retcon and RetconOnce ABIs, 
not for the Switch ABI.
I guess that's accurate for those ABIs? But for Switch ABI this is not true.
And before we were not adding back the split functions to the pipeline to be 
properly optimized. Now we are dong that. This should help improve the 
performance of the post-split functions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95807: [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-30 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 355607.
lxfind added a comment.

fix warning


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

Files:
  clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/test/Transforms/Coroutines/ArgAddr.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O2.ll
  llvm/test/Transforms/Coroutines/coro-alloca-01.ll
  llvm/test/Transforms/Coroutines/coro-alloca-02.ll
  llvm/test/Transforms/Coroutines/coro-alloca-03.ll
  llvm/test/Transforms/Coroutines/coro-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-alloca-05.ll
  llvm/test/Transforms/Coroutines/coro-alloca-06.ll
  llvm/test/Transforms/Coroutines/coro-alloca-07.ll
  llvm/test/Transforms/Coroutines/coro-alloca-08.ll
  llvm/test/Transforms/Coroutines/coro-async.ll
  llvm/test/Transforms/Coroutines/coro-byval-param.ll
  llvm/test/Transforms/Coroutines/coro-catchswitch-cleanuppad.ll
  llvm/test/Transforms/Coroutines/coro-catchswitch.ll
  llvm/test/Transforms/Coroutines/coro-debug.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-00.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-01.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-02.ll
  llvm/test/Transforms/Coroutines/coro-frame-arrayalloca.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-00.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-01.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-02.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-03.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
  llvm/test/Transforms/Coroutines/coro-frame-unreachable.ll
  llvm/test/Transforms/Coroutines/coro-frame.ll
  llvm/test/Transforms/Coroutines/coro-materialize.ll
  llvm/test/Transforms/Coroutines/coro-padding.ll
  llvm/test/Transforms/Coroutines/coro-param-copy.ll
  llvm/test/Transforms/Coroutines/coro-retcon-alloca.ll
  llvm/test/Transforms/Coroutines/coro-retcon-frame.ll
  llvm/test/Transforms/Coroutines/coro-retcon-once-value.ll
  llvm/test/Transforms/Coroutines/coro-retcon-once-value2.ll
  llvm/test/Transforms/Coroutines/coro-retcon-resume-values.ll
  llvm/test/Transforms/Coroutines/coro-retcon-resume-values2.ll
  llvm/test/Transforms/Coroutines/coro-retcon-unreachable.ll
  llvm/test/Transforms/Coroutines/coro-retcon-value.ll
  llvm/test/Transforms/Coroutines/coro-retcon.ll
  llvm/test/Transforms/Coroutines/coro-spill-after-phi.ll
  llvm/test/Transforms/Coroutines/coro-spill-corobegin.ll
  llvm/test/Transforms/Coroutines/coro-spill-defs-before-corobegin.ll
  llvm/test/Transforms/Coroutines/coro-spill-promise.ll
  llvm/test/Transforms/Coroutines/coro-split-00.ll
  llvm/test/Transforms/Coroutines/coro-split-02.ll
  llvm/test/Transforms/Coroutines/coro-split-alloc.ll
  llvm/test/Transforms/Coroutines/coro-split-dbg.ll
  llvm/test/Transforms/Coroutines/coro-split-eh-00.ll
  llvm/test/Transforms/Coroutines/coro-split-eh-01.ll
  llvm/test/Transforms/Coroutines/coro-split-hidden.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail1.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail2.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail3.ll
  llvm/test/Transforms/Coroutines/coro-split-recursive.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-03.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-04.ll
  llvm/test/Transforms/Coroutines/coro-swifterror.ll
  llvm/test/Transforms/Coroutines/coro-zero-alloca.ll
  llvm/test/Transforms/Coroutines/no-suspend.ll
  llvm/test/Transforms/Coroutines/restart-trigger.ll
  llvm/test/Transforms/Coroutines/smoketest.ll

Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -10,12 +10,16 @@
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
 ; RUN: -debug-pass-manager 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -debug-pass-manager \
-; RUN: -passes='function(coro-early),cgscc(coro-split),function(coro-elide,coro-cleanup)' 2>&1 \
+; RUN: -passes='function(coro-early),function(coro-elide),cgscc(coro-split),function(coro-cleanup)' 2>&1 \
 ; RUN: | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 
+; note that we run CoroElidePass before CoroSplitPass. This is because CoroElidePass is part of
+; function simplification pipeline, which runs before CoroSplitPass. And since @foo is not
+; a 

[PATCH] D95807: [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-30 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2112-2114
 StringRef Value = Attr.getValueAsString();
 LLVM_DEBUG(dbgs() << "CoroSplit: Processing coroutine '" << F.getName()
   << "' state: " << Value << "\n");

ChuanqiXu wrote:
> Refactor this into:
> ```
> LLVM_DEBUG(dbgs() << "CoroSplit: Processing coroutine '" << F.getName()
>   << "' state: " << Attr.getValueAsString() << "\n");
> ```
> could erase an warning in release build.
Good catch. How did you catch this? It seems like I don't see warning on my Mac 
by default.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95807: [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-29 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 355431.
lxfind added a comment.

Put the post-split ramp function back to the CGSCC worklist


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

Files:
  clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/test/Transforms/Coroutines/ArgAddr.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O2.ll
  llvm/test/Transforms/Coroutines/coro-alloca-01.ll
  llvm/test/Transforms/Coroutines/coro-alloca-02.ll
  llvm/test/Transforms/Coroutines/coro-alloca-03.ll
  llvm/test/Transforms/Coroutines/coro-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-alloca-05.ll
  llvm/test/Transforms/Coroutines/coro-alloca-06.ll
  llvm/test/Transforms/Coroutines/coro-alloca-07.ll
  llvm/test/Transforms/Coroutines/coro-alloca-08.ll
  llvm/test/Transforms/Coroutines/coro-async.ll
  llvm/test/Transforms/Coroutines/coro-byval-param.ll
  llvm/test/Transforms/Coroutines/coro-catchswitch-cleanuppad.ll
  llvm/test/Transforms/Coroutines/coro-catchswitch.ll
  llvm/test/Transforms/Coroutines/coro-debug.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-00.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-01.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-02.ll
  llvm/test/Transforms/Coroutines/coro-frame-arrayalloca.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-00.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-01.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-02.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-03.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
  llvm/test/Transforms/Coroutines/coro-frame-unreachable.ll
  llvm/test/Transforms/Coroutines/coro-frame.ll
  llvm/test/Transforms/Coroutines/coro-materialize.ll
  llvm/test/Transforms/Coroutines/coro-padding.ll
  llvm/test/Transforms/Coroutines/coro-param-copy.ll
  llvm/test/Transforms/Coroutines/coro-retcon-alloca.ll
  llvm/test/Transforms/Coroutines/coro-retcon-frame.ll
  llvm/test/Transforms/Coroutines/coro-retcon-once-value.ll
  llvm/test/Transforms/Coroutines/coro-retcon-once-value2.ll
  llvm/test/Transforms/Coroutines/coro-retcon-resume-values.ll
  llvm/test/Transforms/Coroutines/coro-retcon-resume-values2.ll
  llvm/test/Transforms/Coroutines/coro-retcon-unreachable.ll
  llvm/test/Transforms/Coroutines/coro-retcon-value.ll
  llvm/test/Transforms/Coroutines/coro-retcon.ll
  llvm/test/Transforms/Coroutines/coro-spill-after-phi.ll
  llvm/test/Transforms/Coroutines/coro-spill-corobegin.ll
  llvm/test/Transforms/Coroutines/coro-spill-defs-before-corobegin.ll
  llvm/test/Transforms/Coroutines/coro-spill-promise.ll
  llvm/test/Transforms/Coroutines/coro-split-00.ll
  llvm/test/Transforms/Coroutines/coro-split-02.ll
  llvm/test/Transforms/Coroutines/coro-split-alloc.ll
  llvm/test/Transforms/Coroutines/coro-split-dbg.ll
  llvm/test/Transforms/Coroutines/coro-split-eh-00.ll
  llvm/test/Transforms/Coroutines/coro-split-eh-01.ll
  llvm/test/Transforms/Coroutines/coro-split-hidden.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail1.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail2.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail3.ll
  llvm/test/Transforms/Coroutines/coro-split-recursive.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-03.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-04.ll
  llvm/test/Transforms/Coroutines/coro-swifterror.ll
  llvm/test/Transforms/Coroutines/coro-zero-alloca.ll
  llvm/test/Transforms/Coroutines/no-suspend.ll
  llvm/test/Transforms/Coroutines/restart-trigger.ll
  llvm/test/Transforms/Coroutines/smoketest.ll

Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -10,12 +10,16 @@
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
 ; RUN: -debug-pass-manager 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -debug-pass-manager \
-; RUN: -passes='function(coro-early),cgscc(coro-split),function(coro-elide,coro-cleanup)' 2>&1 \
+; RUN: -passes='function(coro-early),function(coro-elide),cgscc(coro-split),function(coro-cleanup)' 2>&1 \
 ; RUN: | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 
+; note that we run CoroElidePass before CoroSplitPass. This is because CoroElidePass is part of
+; function simplification pipeline, which runs before 

[PATCH] D95807: [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-29 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> If coroutine ramp function couldn't get inlined, it would disable coroutine 
> elide optimization. Could you elaborate more on why do you want to do that?

Ramp function will eventually be inlined, but not when you run Inliner on the 
inlinee.
Let's say coroutine A calls coroutine B, and eventually we want to inline B 
into A so that we could perform CoroElide on A.
After B is split, we don't need to run inliner again on B. When we run inliner 
on A, A will inline B.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95807: [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-29 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D95807#2849053 , @aeubanks wrote:

> this will run the function simplification pipeline twice on every single 
> function when coroutines are enabled, I don't think that's the intention
>
> I thought the intention was to do all the the re-adding of SCCs inside 
> CoroSplit.cpp, including the SCC with the function that was split

Good point. I was trying to avoid the second inliner on the coroutine ramp 
function. But I guess the cost will be bigger than the win.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95807: [Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-29 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D95807#2846358 , @ChuanqiXu wrote:

>> note that we don't really need to run Inliner again on the ramp function 
>> after split
>
> This isn't accurate. The inline may run again for ramp function after split 
> and it's required by coro elide.

If there is an inlining opportunity, it should have happened pre-split, right? 
Is there any reason it didn't happen pre-split but only post-split?

> It seems like that we don't need the attribute `CORO_PRESPLIT_ATTR` any more, 
> do we? If yes, I think we should remove them.

It's still needed by the legacy pass manager. I don't want to break that yet.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D95807: [RFC][Coroutines] Add the newly generated SCCs back to the CGSCC work queue after CoroSplit actually happened

2021-06-28 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 355110.
lxfind added a comment.
Herald added subscribers: cfe-commits, qcolombet.
Herald added a project: clang.

After removing the legacy test command, I was finally able to update this 
patch. It's now ready for review. I will update the decription to reflect to 
the latest changes


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95807/new/

https://reviews.llvm.org/D95807

Files:
  clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp
  llvm/test/Transforms/Coroutines/ArgAddr.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O0.ll
  llvm/test/Transforms/Coroutines/coro-alloc-with-param-O2.ll
  llvm/test/Transforms/Coroutines/coro-alloca-01.ll
  llvm/test/Transforms/Coroutines/coro-alloca-02.ll
  llvm/test/Transforms/Coroutines/coro-alloca-03.ll
  llvm/test/Transforms/Coroutines/coro-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-alloca-05.ll
  llvm/test/Transforms/Coroutines/coro-alloca-06.ll
  llvm/test/Transforms/Coroutines/coro-alloca-07.ll
  llvm/test/Transforms/Coroutines/coro-alloca-08.ll
  llvm/test/Transforms/Coroutines/coro-async.ll
  llvm/test/Transforms/Coroutines/coro-byval-param.ll
  llvm/test/Transforms/Coroutines/coro-catchswitch-cleanuppad.ll
  llvm/test/Transforms/Coroutines/coro-catchswitch.ll
  llvm/test/Transforms/Coroutines/coro-debug.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-00.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-01.ll
  llvm/test/Transforms/Coroutines/coro-eh-aware-edge-split-02.ll
  llvm/test/Transforms/Coroutines/coro-frame-arrayalloca.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-00.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-01.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-02.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-03.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-04.ll
  llvm/test/Transforms/Coroutines/coro-frame-reuse-alloca-05.ll
  llvm/test/Transforms/Coroutines/coro-frame-unreachable.ll
  llvm/test/Transforms/Coroutines/coro-frame.ll
  llvm/test/Transforms/Coroutines/coro-materialize.ll
  llvm/test/Transforms/Coroutines/coro-padding.ll
  llvm/test/Transforms/Coroutines/coro-param-copy.ll
  llvm/test/Transforms/Coroutines/coro-retcon-alloca.ll
  llvm/test/Transforms/Coroutines/coro-retcon-frame.ll
  llvm/test/Transforms/Coroutines/coro-retcon-once-value.ll
  llvm/test/Transforms/Coroutines/coro-retcon-once-value2.ll
  llvm/test/Transforms/Coroutines/coro-retcon-resume-values.ll
  llvm/test/Transforms/Coroutines/coro-retcon-resume-values2.ll
  llvm/test/Transforms/Coroutines/coro-retcon-unreachable.ll
  llvm/test/Transforms/Coroutines/coro-retcon-value.ll
  llvm/test/Transforms/Coroutines/coro-retcon.ll
  llvm/test/Transforms/Coroutines/coro-spill-after-phi.ll
  llvm/test/Transforms/Coroutines/coro-spill-corobegin.ll
  llvm/test/Transforms/Coroutines/coro-spill-defs-before-corobegin.ll
  llvm/test/Transforms/Coroutines/coro-spill-promise.ll
  llvm/test/Transforms/Coroutines/coro-split-00.ll
  llvm/test/Transforms/Coroutines/coro-split-02.ll
  llvm/test/Transforms/Coroutines/coro-split-alloc.ll
  llvm/test/Transforms/Coroutines/coro-split-dbg.ll
  llvm/test/Transforms/Coroutines/coro-split-eh-00.ll
  llvm/test/Transforms/Coroutines/coro-split-eh-01.ll
  llvm/test/Transforms/Coroutines/coro-split-hidden.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail1.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail2.ll
  llvm/test/Transforms/Coroutines/coro-split-musttail3.ll
  llvm/test/Transforms/Coroutines/coro-split-recursive.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-01.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-02.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-03.ll
  llvm/test/Transforms/Coroutines/coro-split-sink-lifetime-04.ll
  llvm/test/Transforms/Coroutines/coro-swifterror.ll
  llvm/test/Transforms/Coroutines/coro-zero-alloca.ll
  llvm/test/Transforms/Coroutines/no-suspend.ll
  llvm/test/Transforms/Coroutines/restart-trigger.ll

Index: llvm/test/Transforms/Coroutines/restart-trigger.ll
===
--- llvm/test/Transforms/Coroutines/restart-trigger.ll
+++ llvm/test/Transforms/Coroutines/restart-trigger.ll
@@ -1,11 +1,14 @@
 ; REQUIRES: asserts
 ; The following tests use the new pass manager, and verify that the coroutine
 ; passes re-run the CGSCC pipeline.
-; RUN: opt < %s -S -passes='default' -enable-coroutines -debug-only=coro-split 2>&1 | FileCheck %s
-; RUN: opt < %s -S -passes='default' -enable-coroutines -debug-only=coro-split 2>&1 | FileCheck %s
+; RUN: opt < %s -S -passes='default' -enable-coroutines -debug-only=coro-split 2>&1 | FileCheck --check-prefix=CHECK-NEWPM %s
+; RUN: opt < %s -S 

[PATCH] D105066: [Coroutines] Remove CoroElide from O0 pipeline

2021-06-28 Thread Xun Li via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rG31eb696fc4cd: [Coroutines] Remove CoroElide from O0 pipeline 
(authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105066/new/

https://reviews.llvm.org/D105066

Files:
  clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/test/Transforms/Coroutines/smoketest.ll


Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -2,21 +2,21 @@
 ; levels and -enable-coroutines adds coroutine passes to the pipeline.
 ;
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -debug-pass-manager \
 ; RUN: 
-passes='function(coro-early),cgscc(coro-split),function(coro-elide,coro-cleanup)'
 2>&1 \
-; RUN: | FileCheck %s
+; RUN: | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 
-; CHECK: CoroEarlyPass
-; CHECK: CoroSplitPass
-; CHECK: CoroElidePass
-; CHECK: CoroCleanupPass
+; CHECK-ALL: CoroEarlyPass
+; CHECK-ALL: CoroSplitPass
+; CHECK-OPT: CoroElidePass
+; CHECK-ALL: CoroCleanupPass
 
 define void @foo() {
   ret void
Index: llvm/lib/Passes/PassBuilder.cpp
===
--- llvm/lib/Passes/PassBuilder.cpp
+++ llvm/lib/Passes/PassBuilder.cpp
@@ -1986,7 +1986,6 @@
 
 CGSCCPassManager CGPM;
 CGPM.addPass(CoroSplitPass());
-CGPM.addPass(createCGSCCToFunctionPassAdaptor(CoroElidePass()));
 MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(std::move(CGPM)));
 
 MPM.addPass(createModuleToFunctionPassAdaptor(CoroCleanupPass()));
Index: clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
===
--- clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
+++ clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
@@ -3,23 +3,23 @@
 
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null 
\
 // RUN:   -fexperimental-new-pass-manager -fdebug-pass-manager -fcoroutines-ts 
\
-// RUN:   -O0 %s 2>&1 | FileCheck %s
+// RUN:   -O0 %s 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null 
\
 // RUN:   -fexperimental-new-pass-manager -fdebug-pass-manager -fcoroutines-ts 
\
-// RUN:   -O1 %s 2>&1 | FileCheck %s
+// RUN:   -O1 %s 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 //
-// CHECK: Running pass:{{.*}}CoroEarlyPass
+// CHECK-ALL: Running pass:{{.*}}CoroEarlyPass
 //
 // The first coro-split pass enqueues a second run of the entire CGSCC 
pipeline.
-// CHECK: Running pass: CoroSplitPass on (_Z3foov)
-// CHECK: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
+// CHECK-ALL: Running pass: CoroSplitPass on (_Z3foov)
+// CHECK-OPT: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
 //
 // The second coro-split pass splits coroutine 'foo' into funclets
 // 'foo.resume', 'foo.destroy', and 'foo.cleanup'.
-// CHECK: Running pass: CoroSplitPass on (_Z3foov)
-// CHECK: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
+// CHECK-ALL: Running pass: CoroSplitPass on (_Z3foov)
+// CHECK-OPT: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
 //
-// CHECK: Running pass:{{.*}}CoroCleanupPass
+// CHECK-ALL: Running pass:{{.*}}CoroCleanupPass
 
 namespace std {
 namespace experimental {


Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -2,21 +2,21 @@
 ; levels and -enable-coroutines adds coroutine passes to the pipeline.
 ;
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: 

[PATCH] D105066: [Coroutines] Remove CoroElide from O0 pipeline

2021-06-28 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Yeah, but it may be inlined  after splitting, which could trigger coro elide.

In O0, there is no inliner pass (after CoroSplit), so inlining should never 
happen.

>> in fact, we should make it illegal to mark a coroutine "always_inline", 
>> because there is no guarantee that a coroutine can be fully inlined, GCC 
>> does that
>
> To my understanding, it looks like that we shouldn't inline it since we 
> couldn't inline all parts of the function. Is this what you want to say?
> I think it may be a problem that we can't inline the full coroutine. But it's 
> not the reason to forbid it.

That's a separate topic though. Let's agree on this diff first and then I can 
explain more about the always_inline issue.

> ---
>
> Coro Elide is not defined in the standard (although it comes up in the 
> proposal). So it should be a compiler optimization. In this way, it should be 
> OK to remove it in O0.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105066/new/

https://reviews.llvm.org/D105066

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105066: [Coroutines] Remove CoroElide from O0 pipeline

2021-06-28 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D105066#2845958 , @ChuanqiXu wrote:

> On O0, it is possible to inline if the user marked the function with 
> `always_inline`.
> Since CoroElide is kind of optimization, it should be OK to skip in O0.
> Out of curiosity, what's the reason that you want to remove it?

Coroutine functions cannot be inlined before splitting, even if it's marked 
"always_inline" (in fact, we should make it illegal to mark a coroutine 
"always_inline", because there is no guarantee that a coroutine can be fully 
inlined, GCC does that).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105066/new/

https://reviews.llvm.org/D105066

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D105066: [Coroutines] Remove CoroElide from O0 pipeline

2021-06-28 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 355073.
lxfind added a comment.

update tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105066/new/

https://reviews.llvm.org/D105066

Files:
  clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/test/Transforms/Coroutines/smoketest.ll


Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -2,21 +2,21 @@
 ; levels and -enable-coroutines adds coroutine passes to the pipeline.
 ;
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -debug-pass-manager \
 ; RUN: 
-passes='function(coro-early),cgscc(coro-split),function(coro-elide,coro-cleanup)'
 2>&1 \
-; RUN: | FileCheck %s
+; RUN: | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 
-; CHECK: CoroEarlyPass
-; CHECK: CoroSplitPass
-; CHECK: CoroElidePass
-; CHECK: CoroCleanupPass
+; CHECK-ALL: CoroEarlyPass
+; CHECK-ALL: CoroSplitPass
+; CHECK-OPT: CoroElidePass
+; CHECK-ALL: CoroCleanupPass
 
 define void @foo() {
   ret void
Index: llvm/lib/Passes/PassBuilder.cpp
===
--- llvm/lib/Passes/PassBuilder.cpp
+++ llvm/lib/Passes/PassBuilder.cpp
@@ -1986,7 +1986,6 @@
 
 CGSCCPassManager CGPM;
 CGPM.addPass(CoroSplitPass());
-CGPM.addPass(createCGSCCToFunctionPassAdaptor(CoroElidePass()));
 MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(std::move(CGPM)));
 
 MPM.addPass(createModuleToFunctionPassAdaptor(CoroCleanupPass()));
Index: clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
===
--- clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
+++ clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
@@ -3,23 +3,23 @@
 
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null 
\
 // RUN:   -fexperimental-new-pass-manager -fdebug-pass-manager -fcoroutines-ts 
\
-// RUN:   -O0 %s 2>&1 | FileCheck %s
+// RUN:   -O0 %s 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null 
\
 // RUN:   -fexperimental-new-pass-manager -fdebug-pass-manager -fcoroutines-ts 
\
-// RUN:   -O1 %s 2>&1 | FileCheck %s
+// RUN:   -O1 %s 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 //
-// CHECK: Running pass:{{.*}}CoroEarlyPass
+// CHECK-ALL: Running pass:{{.*}}CoroEarlyPass
 //
 // The first coro-split pass enqueues a second run of the entire CGSCC 
pipeline.
-// CHECK: Running pass: CoroSplitPass on (_Z3foov)
-// CHECK: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
+// CHECK-ALL: Running pass: CoroSplitPass on (_Z3foov)
+// CHECK-OPT: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
 //
 // The second coro-split pass splits coroutine 'foo' into funclets
 // 'foo.resume', 'foo.destroy', and 'foo.cleanup'.
-// CHECK: Running pass: CoroSplitPass on (_Z3foov)
-// CHECK: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
+// CHECK-ALL: Running pass: CoroSplitPass on (_Z3foov)
+// CHECK-OPT: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
 //
-// CHECK: Running pass:{{.*}}CoroCleanupPass
+// CHECK-ALL: Running pass:{{.*}}CoroCleanupPass
 
 namespace std {
 namespace experimental {


Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -2,21 +2,21 @@
 ; levels and -enable-coroutines adds coroutine passes to the pipeline.
 ;
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 

[PATCH] D105066: [Coroutines] Remove CoroElide from O0 pipeline

2021-06-28 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
lxfind added reviewers: ChuanqiXu, rjmccall.
Herald added subscribers: hoy, modimo, wenlei, hiraditya.
lxfind requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

CoroElide pass works only when a post-split coroutine is inlined into another 
post-split coroutine.
In O0, there is no inlining after CoroSplit, and hence no CoroElide can happen.
It's useless to put CoroElide pass in the O0 pipeline and it will never be 
triggered (unless I miss anything).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D105066

Files:
  clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/test/Transforms/Coroutines/smoketest.ll


Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -2,20 +2,20 @@
 ; levels and -enable-coroutines adds coroutine passes to the pipeline.
 ;
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: -debug-pass-manager 2>&1 | FileCheck %s
+; RUN: -debug-pass-manager 2>&1 | FileCheck %s 
--check-prefixes=CHECK-ALL,CHECK-OPT
 ; RUN: opt < %s -disable-output -debug-pass-manager \
 ; RUN: 
-passes='function(coro-early),cgscc(coro-split),function(coro-elide,coro-cleanup)'
 2>&1 \
-; RUN: | FileCheck %s
+; RUN: | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 
 ; CHECK: CoroEarlyPass
 ; CHECK: CoroSplitPass
-; CHECK: CoroElidePass
+; CHECK-OPT: CoroElidePass
 ; CHECK: CoroCleanupPass
 
 define void @foo() {
Index: llvm/lib/Passes/PassBuilder.cpp
===
--- llvm/lib/Passes/PassBuilder.cpp
+++ llvm/lib/Passes/PassBuilder.cpp
@@ -1986,7 +1986,6 @@
 
 CGSCCPassManager CGPM;
 CGPM.addPass(CoroSplitPass());
-CGPM.addPass(createCGSCCToFunctionPassAdaptor(CoroElidePass()));
 MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(std::move(CGPM)));
 
 MPM.addPass(createModuleToFunctionPassAdaptor(CoroCleanupPass()));
Index: clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
===
--- clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
+++ clang/test/CodeGenCoroutines/coro-newpm-pipeline.cpp
@@ -3,23 +3,23 @@
 
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null 
\
 // RUN:   -fexperimental-new-pass-manager -fdebug-pass-manager -fcoroutines-ts 
\
-// RUN:   -O0 %s 2>&1 | FileCheck %s
+// RUN:   -O0 %s 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL
 // RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm-bc -o /dev/null 
\
 // RUN:   -fexperimental-new-pass-manager -fdebug-pass-manager -fcoroutines-ts 
\
-// RUN:   -O1 %s 2>&1 | FileCheck %s
+// RUN:   -O1 %s 2>&1 | FileCheck %s --check-prefixes=CHECK-ALL,CHECK-OPT
 //
-// CHECK: Running pass:{{.*}}CoroEarlyPass
+// CHECK-ALL: Running pass:{{.*}}CoroEarlyPass
 //
 // The first coro-split pass enqueues a second run of the entire CGSCC 
pipeline.
-// CHECK: Running pass: CoroSplitPass on (_Z3foov)
-// CHECK: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
+// CHECK-ALL: Running pass: CoroSplitPass on (_Z3foov)
+// CHECK-OPT: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
 //
 // The second coro-split pass splits coroutine 'foo' into funclets
 // 'foo.resume', 'foo.destroy', and 'foo.cleanup'.
-// CHECK: Running pass: CoroSplitPass on (_Z3foov)
-// CHECK: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
+// CHECK-ALL: Running pass: CoroSplitPass on (_Z3foov)
+// CHECK-OPT: Running pass:{{.*}}CoroElidePass{{.*}} on {{.*}}_Z3foov{{.*}}
 //
-// CHECK: Running pass:{{.*}}CoroCleanupPass
+// CHECK-ALL: Running pass:{{.*}}CoroCleanupPass
 
 namespace std {
 namespace experimental {


Index: llvm/test/Transforms/Coroutines/smoketest.ll
===
--- llvm/test/Transforms/Coroutines/smoketest.ll
+++ llvm/test/Transforms/Coroutines/smoketest.ll
@@ -2,20 +2,20 @@
 ; levels and -enable-coroutines adds coroutine passes to the pipeline.
 ;
 ; RUN: opt < %s -disable-output -passes='default' -enable-coroutines \
-; RUN: 

[PATCH] D102465: [Coroutines] Mark every parameter

2021-05-13 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
Herald added subscribers: ChuanqiXu, hoy, modimo, wenlei, hiraditya.
lxfind requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, jdoerfert.
Herald added projects: clang, LLVM.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D102465

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/test/CodeGenCoroutines/Inputs/coroutine.h
  clang/test/CodeGenCoroutines/coro-param-memcpy.cpp
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp

Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -1766,6 +1766,11 @@
   bool ReuseFrameSlot) {
   PrettyStackTraceFunction prettyStackTrace(F);
 
+  for (Instruction  : make_early_inc_range(instructions(F)))
+if (auto *II = dyn_cast())
+  if (II->getIntrinsicID() == Intrinsic::coro_mark_param)
+II->eraseFromParent();
+
   // The suspend-crossing algorithm in buildCoroutineFrame get tripped
   // up by uses in unreachable blocks, so remove them as a first pass.
   removeUnreachableBlocks(F);
Index: llvm/include/llvm/IR/Intrinsics.td
===
--- llvm/include/llvm/IR/Intrinsics.td
+++ llvm/include/llvm/IR/Intrinsics.td
@@ -1274,6 +1274,8 @@
  ReadOnly>,
  NoCapture>]>;
 
+def int_coro_mark_param : Intrinsic<[], [llvm_ptr_ty], []>;
+
 ///===-- Other Intrinsics --===//
 //
 def int_trap : Intrinsic<[], [], [IntrNoReturn, IntrCold]>,
@@ -1305,8 +1307,8 @@
 def int_sideeffect : DefaultAttrsIntrinsic<[], [], [IntrInaccessibleMemOnly, IntrWillReturn]>;
 
 // The pseudoprobe intrinsic works as a place holder to the block it probes.
-// Like the sideeffect intrinsic defined above, this intrinsic is treated by the 
-// optimizer as having opaque side effects so that it won't be get rid of or moved 
+// Like the sideeffect intrinsic defined above, this intrinsic is treated by the
+// optimizer as having opaque side effects so that it won't be get rid of or moved
 // out of the block it probes.
 def int_pseudoprobe : Intrinsic<[], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty, llvm_i64_ty],
 [IntrInaccessibleMemOnly, IntrWillReturn]>;
Index: clang/test/CodeGenCoroutines/coro-param-memcpy.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-param-memcpy.cpp
@@ -0,0 +1,44 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++17 -O1 -fno-inline -emit-llvm %s -o - | FileCheck %s
+
+#include "Inputs/coroutine.h"
+
+struct task {
+  struct promise_type {
+task get_return_object() { return {this}; }
+std::experimental::suspend_always initial_suspend() { return {}; }
+std::experimental::suspend_always final_suspend() noexcept { return {}; }
+void return_void() {}
+void unhandled_exception() {}
+  };
+  promise_type *promise;
+};
+
+namespace std::experimental {
+template 
+struct coroutine_traits {
+  using promise_type = typename task::promise_type;
+};
+} // namespace std::experimental
+
+void *g = nullptr;
+
+struct A {
+  unsigned long long a = 1;
+  unsigned long long b;
+  unsigned int c;
+};
+
+task foo(A a1) {
+  A a2 = a1; // Necessary.
+  g =// So the address isn't optimized out.
+  co_return;
+}
+
+// verify that the entire struct param is in the frame.
+// CHECK: %_Z3foo1A.Frame = type { void (%_Z3foo1A.Frame*)*, void (%_Z3foo1A.Frame*)*, %"struct.task::promise_type", i1, %"struct.std::experimental::coroutines_v1::suspend_always", [5 x i8], [24 x i8] }
+
+// CHECK-LABEL: define dso_local %"struct.task::promise_type"* @_Z3foo1A(
+// CHECK: %[[FRAME:.+]] = call noalias nonnull i8* @_Znwm(
+// CHECK: %[[PTR:.+]] = getelementptr inbounds i8, i8* %[[FRAME]], i64 24
+// CHECK: %[[PARAM:.+]] = bitcast %struct.A* %a1 to i8*
+// CHECK: call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(24) %[[PTR]], i8* noundef nonnull align 8 dereferenceable(24) %[[PARAM]], i64 24, i1 false)
Index: clang/test/CodeGenCoroutines/Inputs/coroutine.h
===
--- clang/test/CodeGenCoroutines/Inputs/coroutine.h
+++ clang/test/CodeGenCoroutines/Inputs/coroutine.h
@@ -67,9 +67,9 @@
   }
 
 struct suspend_always {
-  bool await_ready() { return false; }
-  void await_suspend(coroutine_handle<>) {}
-  void await_resume() {}
+  bool await_ready() noexcept { return false; }
+  void await_suspend(coroutine_handle<>) noexcept {}
+  void await_resume() noexcept {}
 };
 struct suspend_never {
   bool await_ready() noexcept { return true; }
Index: 

[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-23 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Sorry for the confusion. I think either overaligned or under-aligned could be 
> used here to describe the problem: either "Handle overaligned frame" or "Fix 
> under-aligned frame". Since c++ spec defines the former but not the later 
> (https://en.cppreference.com/w/cpp/language/object#Alignment), my first 
> intuition was to use the term "overalign". Under-aligned is the undesired 
> outcome that should be fixed (probably too late to handle I assume). Also the 
> overaligned is a static property whereas 'under-aligned" is a runtime 
> property. From the compiler's perspective, I think overaligned should be 
> preferred. With that said, I don't feel strongly about this. I could switch 
> to use "overaligned" if that feels more intuitive.

"Handle" is probably not the right word to be used here. What follows "handle" 
is typically a legit situation that already occurred but not current handled 
properly. Here "overaligned frame" doesn't already occur. From what I 
understand, you really just want to support promise object alignment. So why 
not just say that directly?
To add on that, I do think you need to describe the problem in more detail in 
the description. It's indeed still confusing.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-21 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

Thanks for working on this.
I am still having a bit hard time understanding the solution.
A few questions:

1. I assume this patch is to solve the problem where the promise object is not 
aligned according to its alignof annotation, right? The title/wording is a bit 
misleading. Usually "handling XXX" means XXX is a situation/problem that wasn't 
handle properly before, and it's being handled here. I don't really understand 
what "handle overaligned frame allocation" means. Isn't frame allocation 
under-aligned being the problem?
2. What is the purpose of coro.align intrinsic?
3. Could you provide some examples of what the IR might look like after this 
patch? Either that or a more detailed explanation of how this works in the 
summary.
4. Do you think it might be cleaner to introduce a new variant of coro.size 
instead of adding arguments to it? For example, coro.size.aligned(). This way, 
you can avoid changing any test file for non-switch-lowering test files, but 
focus on all switch-lowering tests.
5. Typically, coro.free is used by a comparison with nullptr. This is to enable 
CoroElide. See: https://llvm.org/docs/Coroutines.html#llvm-coro-free-intrinsic. 
So I don't think you can load from it directly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-21 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:2689
 
-  size_t __builtin_coro_size()
+  size_t __builtin_coro_size(bool alloc)
   void  *__builtin_coro_frame()

ChuanqiXu wrote:
> ychen wrote:
> > ChuanqiXu wrote:
> > > ychen wrote:
> > > > lxfind wrote:
> > > > > Do we need to change __builtin_coro_size? The argument will always be 
> > > > > 1, right?
> > > > > It only starts to change in LLVM intrinsics, if I read the impl 
> > > > > correctly.
> > > > Yeah, It is always 1 for Clang until the spec is fixed (then we could 
> > > > revert it back to 0).  Other clients using `__builtin_coro_size` may 
> > > > use 0 if the client doesn't care about overaligned frame or it could 
> > > > handle overaligned frame by itself. 
> > > BTW, is it OK to edit the `builtin`s directly? Since builtin is different 
> > > with intrinsic which is only visible in the internal of compiler, builtin 
> > > could be used by any end users. Although I know there should be  little 
> > > users who would use `__builtin_coro` APIs, I worry if there is any guide 
> > > principle for editing the `builtin`s.
> > > BTW, is it OK to edit the builtins directly? Since builtin is different 
> > > with intrinsic which is only visible in the internal of compiler, builtin 
> > > could be used by any end users. Although I know there should be little 
> > > users who would use __builtin_coro APIs, I worry if there is any guide 
> > > principle for editing the builtins.
> > 
> > I think it is ok to change these if it is justified like anything else.
> > 
> > builtins/intrinsics are interfaces on different levels. I'm trying to make 
> > __builtin_coro_size consistent with llvm.coro.size because I don't have a 
> > good reason for not doing that. (assume that we keep this opt-in 
> > overaligned frame handling in LLVM even after the spec is fixed since it 
> > helps solve a practical problem and the maintenance cost is low)
> > 
> > 
> It doesn't make sense to me that we need to change the signature for 
> `__builtin_coro_size` in this patch. In other words, why do we need to change 
> `__builtin_coro_size `? What are problems that can't be solved if we don't 
> change `__builtin_coro_size`? At least, if it is necessary to change 
> `__builtin_coro_size`, we could make it in successive patches.
Yeah I agree with ChuanqiXu, there is no need to make the builtin to be exactly 
the same as the llvm intrinsics just because they have the same name. Many of 
them are different even though they have the same name.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100739: [Coroutines] Handle overaligned frame allocation (2)

2021-04-20 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/docs/LanguageExtensions.rst:2689
 
-  size_t __builtin_coro_size()
+  size_t __builtin_coro_size(bool alloc)
   void  *__builtin_coro_frame()

Do we need to change __builtin_coro_size? The argument will always be 1, right?
It only starts to change in LLVM intrinsics, if I read the impl correctly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100739/new/

https://reviews.llvm.org/D100739

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function

2021-04-20 Thread Xun Li via Phabricator via cfe-commits
lxfind planned changes to this revision.
lxfind added a comment.

Plan to add documentation, fix Legacy pass and address comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100415/new/

https://reviews.llvm.org/D100415

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass

2021-04-19 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D100282#2699171 , @rjmccall wrote:

> MLIR is an in-tree project that can be updated.

Sure, but I think there are some important differences.
As far as I understand, in MLIR, unlike in C++/Swift frontend where a coroutine 
function body is represented by an explicit AST type, there is no concept for 
coroutine functions. Instead functions just contain async dialects. So for MLIR 
to properly annotate coroutine functions, it will need to look for either those 
dialects or these intrinsics after IRGen in order to do so, which is pretty 
much the same thing that we were doing in CoroEarly to annotate coroutine 
functions. The complexity introduced by duplicating this to all frontends, 
especially in MLIR where we need to do the same thing as we were doing in 
CoroEarly, seems to out-weight the benefits on conceptual clarity.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100282/new/

https://reviews.llvm.org/D100282

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass

2021-04-18 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

This broke MLIR tests.
It seems that MLIR tests depend on CoroEarly to be able to annotate coroutine 
function properly based on the intrinsics.
Given that, I am now convinced we shouldn't set the attribute in the frontend. 
Instead we should simply move CoroEarly to before AlwaysInliner.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100282/new/

https://reviews.llvm.org/D100282

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass

2021-04-18 Thread Xun Li via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG2b50f5a4343f: [Coroutines] Move CoroEarly pass to before 
AlwaysInliner (authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100282/new/

https://reviews.llvm.org/D100282

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-always-inline-resume.cpp
  clang/test/CodeGenCoroutines/coro-always-inline.cpp
  llvm/lib/Transforms/Coroutines/CoroEarly.cpp
  llvm/test/Transforms/Coroutines/coro-debug-O2.ll
  llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll
  llvm/test/Transforms/Coroutines/coro-split-01.ll
  llvm/test/Transforms/Coroutines/coro-split-recursive.ll
  llvm/test/Transforms/Coroutines/ex0.ll
  llvm/test/Transforms/Coroutines/ex1.ll
  llvm/test/Transforms/Coroutines/ex2.ll
  llvm/test/Transforms/Coroutines/ex3.ll
  llvm/test/Transforms/Coroutines/ex4.ll
  llvm/test/Transforms/Coroutines/ex5.ll
  llvm/test/Transforms/Coroutines/phi-coro-end.ll
  llvm/test/Transforms/Coroutines/restart-trigger.ll

Index: llvm/test/Transforms/Coroutines/restart-trigger.ll
===
--- llvm/test/Transforms/Coroutines/restart-trigger.ll
+++ llvm/test/Transforms/Coroutines/restart-trigger.ll
@@ -12,7 +12,7 @@
 ; CHECK:  CoroSplit: Processing coroutine 'f' state: 0
 ; CHECK-NEXT: CoroSplit: Processing coroutine 'f' state: 1
 
-define void @f() {
+define void @f() "coroutine.presplit"="0" {
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
   %alloc = call i8* @malloc(i32 %size)
Index: llvm/test/Transforms/Coroutines/phi-coro-end.ll
===
--- llvm/test/Transforms/Coroutines/phi-coro-end.ll
+++ llvm/test/Transforms/Coroutines/phi-coro-end.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex5.ll
===
--- llvm/test/Transforms/Coroutines/ex5.ll
+++ llvm/test/Transforms/Coroutines/ex5.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex4.ll
===
--- llvm/test/Transforms/Coroutines/ex4.ll
+++ llvm/test/Transforms/Coroutines/ex4.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %promise = alloca i32
   %pv = bitcast i32* %promise to i8*
Index: llvm/test/Transforms/Coroutines/ex3.ll
===
--- llvm/test/Transforms/Coroutines/ex3.ll
+++ llvm/test/Transforms/Coroutines/ex3.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex2.ll
===
--- llvm/test/Transforms/Coroutines/ex2.ll
+++ llvm/test/Transforms/Coroutines/ex2.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
Index: llvm/test/Transforms/Coroutines/ex1.ll
===
--- llvm/test/Transforms/Coroutines/ex1.ll
+++ llvm/test/Transforms/Coroutines/ex1.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 ; RUN: opt < %s 

[PATCH] D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function

2021-04-18 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2231
 // coroutine.
 struct CoroSplitLegacy : public CallGraphSCCPass {
   static char ID; // Pass identification, replacement for typeid

ChuanqiXu wrote:
> I am not familiar with the policy in LLVM that how should we treat LegacyPass 
> in trunk. I mean, are we responsible to update the LegacyPassManager?
Yes I think so. I will deal with the legacypass latter.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100415/new/

https://reviews.llvm.org/D100415

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function

2021-04-15 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

@ChuanqiXu Thank you for the detailed review! Really appreciate it.
I agree we should create a coroutine benchmark at some point, ideally some 
realistic production-code driven benchmark. We can work on that in the future. 
For this patch, it's probably not worth it to hide it behind an option, for two 
reasons: 1) it would be extremely complicated, 2) most parameters would end up 
on the frame anyway 3) this patch actually doesn't force parameters to be put 
on the frame. Before frame creation, all the parameters are put back to 
allocas, the current alloca analysis and optimization still applies to them. So 
some parameters may actually end up not put on the frame. So I wouldn't expect 
this to increase frame size in most cases.

I will add documentation latter once the we all agree on the high-level 
idea/direction of this patch.




Comment at: clang/lib/CodeGen/CGCoroutine.cpp:646
 
+Builder.CreateCall(CGM.getIntrinsic(llvm::Intrinsic::coro_init_end));
+Builder.CreateBr(InitReadyBB);

ChuanqiXu wrote:
> It calls `coro.init.end` without calling `coro.init` in the front which looks 
> odd.
This path is conditionally guarded by `coro.init` alrady.



Comment at: llvm/lib/Transforms/Coroutines/CoroEarly.cpp:193
+F.getName() + ".ramp");
+NewF->addFnAttr(Attribute::NoInline);
+M->getFunctionList().push_back(NewF);

ChuanqiXu wrote:
> Noticed that this patch deletes `F.addFnAttr(CORO_PRESPLIT_ATTR, 
> UNPREPARED_FOR_SPLIT);` below, is it conflicting with `D100282 `. I want to 
> know if we still ned to add `Noinline` attribute once `D100282 ` checked in.
Good question. For now they are somewhat redundant. We probably don't need to 
add NoInline here.



Comment at: llvm/lib/Transforms/Coroutines/CoroEarly.cpp:218
+II->replaceAllUsesWith(
+llvm::ConstantInt::get(llvm::Type::getInt1Ty(C), 0));
+break;

ChuanqiXu wrote:
> Why do we need to replace `coro.alloc` with 0 now?
> Replace `coro.alloc` with 0 implies we should allocate the frame in the 
> stack. I think we can't know how should we allocate the frame now.
This is replacing it in the NewF (the cloned new ramp function). We only need 
to allocate the frame once, which will be done in the init function. So in the 
ramp function we can always skip it.



Comment at: llvm/lib/Transforms/Coroutines/CoroEarly.cpp:333
   CF->setArgOperand(0, CoroId);
+splitRampFunction(F);
+  }

ChuanqiXu wrote:
> Should we give a another name for `splitRampFunction`? It may be surprising 
> to see `split` in Coro-early pass instead of Coro-split pass.
> BTW, how do you think about create the ramp function in the CodeGen process 
> of frontend?
I thought about doing it in CodeGen. But it's really complicated to split 
functions in CodeGen.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100415/new/

https://reviews.llvm.org/D100415

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass

2021-04-13 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D100282#2687532 , @ychen wrote:

> I think the setting is in CoroEarly from the beginning is that it is an 
> implementation detail? Clients should only worry about coroutine shape.  
> Maybe we could set `noinline` in frontends to express the intent and remove 
> it in coroearly/corosplit?

We cannot do that, because we need to distinguish between user-specified 
noinline vs coroutine. Some coroutines in theory could potentially be inlined.
Our choice is really just between setting it in the front-end or moving 
CoroEarly to the beginning of the pipeline.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100282/new/

https://reviews.llvm.org/D100282

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function

2021-04-13 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 337290.
lxfind added a comment.

some cleanups


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100415/new/

https://reviews.llvm.org/D100415

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroEarly.cpp
  llvm/lib/Transforms/Coroutines/CoroInternal.h
  llvm/lib/Transforms/Coroutines/CoroSplit.cpp

Index: llvm/lib/Transforms/Coroutines/CoroSplit.cpp
===
--- llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -2049,6 +2049,74 @@
 Fns.push_back(PrepareFn);
 }
 
+static Function *getCoroInitFunction(Function ) {
+  StringRef RampName = RampFunc.getName();
+  assert(RampName.endswith(".ramp") && "Ramp function must ends with .ramp");
+  StringRef InitName = RampName.substr(0, RampName.size() - 5);
+  return RampFunc.getParent()->getFunction(InitName);
+}
+
+static Function *inlineRampFunction(Function ) {
+  CallInst *RampCall = cast(
+  &*llvm::find_if(instructions(F), [&](const Instruction ) {
+if (const CallInst *CI = dyn_cast())
+  return CI->getCalledFunction()->getName().startswith(F.getName());
+return false;
+  }));
+  InlineFunctionInfo IFI;
+  InlineFunction(*RampCall, IFI);
+
+  SmallVector CoroIds;
+  CoroBeginInst *CoroBegin = nullptr;
+  SmallVector CoroFrameGets;
+  for (Instruction  : instructions(F)) {
+auto *II = dyn_cast();
+if (!II)
+  continue;
+switch (II->getIntrinsicID()) {
+default:
+  break;
+case Intrinsic::coro_id:
+  CoroIds.push_back(II);
+  break;
+case Intrinsic::coro_begin:
+  CoroBegin = cast(II);
+  break;
+case Intrinsic::coro_frame_get:
+  CoroFrameGets.push_back(II);
+  break;
+}
+  }
+  assert(CoroIds.size() == 2 && "There must be two coro.id calls, from the "
+"init function and ramp function respectively");
+  CoroIdInst *RealId = cast(CoroBegin->getId());
+  for (IntrinsicInst *I : CoroIds)
+if (I != RealId)
+  I->replaceAllUsesWith(RealId);
+  DenseMap FrameSlotMap;
+  for (IntrinsicInst *FrameGet : CoroFrameGets) {
+bool IsPromise = cast(FrameGet->getOperand(2))->getZExtValue();
+uint32_t SlotID =
+cast(FrameGet->getOperand(3))->getZExtValue();
+auto Itr = FrameSlotMap.find(SlotID);
+Instruction *Ptr;
+if (Itr == FrameSlotMap.end()) {
+  Ptr = cast(FrameGet->getOperand(1));
+  FrameSlotMap[SlotID] = Ptr;
+} else {
+  Ptr = Itr->second;
+}
+FrameGet->replaceAllUsesWith(Ptr);
+FrameGet->eraseFromParent();
+if (IsPromise) {
+  RealId->setOperand(1, new BitCastInst(Ptr->stripPointerCasts(),
+Ptr->getType(), "", RealId));
+}
+  }
+
+  return RampCall->getCalledFunction();
+}
+
 PreservedAnalyses CoroSplitPass::run(LazyCallGraph::SCC ,
  CGSCCAnalysisManager ,
  LazyCallGraph , CGSCCUpdateResult ) {
@@ -2082,6 +2150,8 @@
 }
   }
 
+  SmallVector UnpreparedInitFuncs;
+  SmallVector InlinedRampFuncs;
   // Split all the coroutines.
   for (LazyCallGraph::Node *N : Coroutines) {
 Function  = N->getFunction();
@@ -2089,12 +2159,24 @@
 StringRef Value = Attr.getValueAsString();
 LLVM_DEBUG(dbgs() << "CoroSplit: Processing coroutine '" << F.getName()
   << "' state: " << Value << "\n");
-if (Value == UNPREPARED_FOR_SPLIT) {
+if (Value == DO_NOT_PROCESS)
+  continue;
+if (Value == UNPREPARED_FOR_SPLIT_RAMP) {
   // Enqueue a second iteration of the CGSCC pipeline on this SCC.
   UR.CWorklist.insert();
-  F.addFnAttr(CORO_PRESPLIT_ATTR, PREPARED_FOR_SPLIT);
+  // Once we allow the ramp function to be optimized, we will split
+  // the init function directly and ignore the ramp function.
+  F.addFnAttr(CORO_PRESPLIT_ATTR, DO_NOT_PROCESS);
+  UnpreparedInitFuncs.push_back(getCoroInitFunction(F));
   continue;
 }
+if (Value == PREPARED_FOR_SPLIT_INIT) {
+  Function *RampFunc = inlineRampFunction(F);
+  InlinedRampFuncs.push_back(RampFunc);
+  RampFunc->removeDeadConstantUsers();
+  RampFunc->dropAllReferences();
+  updateCGAndAnalysisManagerForCGSCCPass(CG, C, *N, AM, UR, FAM);
+}
 F.removeFnAttr(CORO_PRESPLIT_ATTR);
 
 SmallVector Clones;
@@ -2109,6 +2191,23 @@
   UR.RCWorklist.insert(CG.lookupRefSCC(CG.get(*Clones[0])));
 }
   }
+  for (Function *F : UnpreparedInitFuncs)
+F->addFnAttr(CORO_PRESPLIT_ATTR, PREPARED_FOR_SPLIT_INIT);
+  for (Function *DeadF : InlinedRampFuncs) {
+auto  = *CG.lookupSCC(*CG.lookup(*DeadF));
+FAM.clear(*DeadF, DeadF->getName());
+AM.clear(DeadC, DeadC.getName());
+auto  = DeadC.getOuterRefSCC();
+CG.removeDeadFunction(*DeadF);
+
+ 

[PATCH] D100415: [Coroutines] Split coroutine during CoroEarly into an init and ramp function

2021-04-13 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
Herald added subscribers: ChuanqiXu, hoy, modimo, wenlei, hiraditya.
lxfind requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, jdoerfert.
Herald added projects: clang, LLVM.

A coroutine has the following structure in LLVM IR:

  entry:
alloca ..
%promise = alloca ...
%0 = call token @llvm.coro.id(..., %promise
%1 = call i1 @llvm.coro.alloc(token %0)
br i1 %1, label %coro.alloc, label %coro.init
  
  coro.alloc:
%2 = call i64 @llvm.coro.size.i64()
%call = call noalias nonnull i8* @_Znwm(i64 %2)
br label %coro.init
  
  coro.init:   ; preds = %coro.alloc, %entry
%3 = phi i8* [ null, %entry ], [ %call, %coro.alloc ]
%4 = call i8* @llvm.coro.begin(token %0, i8* %3)
...
move parameters to stack alloca
create promise object
...
actual coroutine body
...


It uses coro.id to uniquely identify the coroutine (which also refers to the 
promise object), use coro.alloc to decide whether to create frame on the heap, 
and use coro.begin the mark the frame object.
After that, it always moves all parameters to stack (stored in allocas), create 
the promise object by calling its constructor.
Finally it emits the actual coroutine body code.

Having all of these in the same function creates problems: optimization passes 
blend the initialization code with the coroutine body and move code around, 
which latter violates some of the requirements by coroutines.
There are two examples:

1. Frame objects accessed before coro.begin: coro.begin returns the frame 
pointer, that is, the frame is only ready after coro.begin. If any value is 
used across coroutine suspension and needs to be put on the frame, they need to 
be accessed through the frame instead of alloca. This is easy if the value is 
first accessed after coro.begin: we can just replace all their references by a 
pointer to the frame. However if a value is accessed before coro.begin, but 
also need to live on the frame, we are in trouble. D66230 
 made an initial attempt to fix this, but it 
wasn't complete. I made the fix more robust in D86859 
, which introduced a lot of complexity to 
AllocaUseVisitor. The basic idea is that we track every alloca use (both 
explicit and implicit through aliases) before coro.begin, and if they are 
touched we copy them into the frame after coro.begin. This is however not 
bullet-proof. If there exists complicated phi nodes, we may end up having to 
copy every single alloca to the frame. This patch separate the code before 
coro.begin and after coro.begin, making it impossible for optimization passes 
to mess around. There can be no complicated access to the frame before frame 
creation.
2. Captured by-val parameter through MemCpyOptPass: 
https://bugs.llvm.org/show_bug.cgi?id=48857. To summarize the problem, in the 
coroutine IR, a first mem.copy copies a passed-by-value parameter to a local 
allloca, and latter (after a coroutine suspension) copies the local alloca to 
another local alloca. MemCpyOptPass merges them and turns the second copy to be 
copying directly from the parameter to the second local alloca. This will lead 
to crash because the passed-by-value parameter pointer would have died after 
the coroutine suspension. This patch separate the parameter copy code and the 
coroutine body, making this kind of optimizations impossible.

Overall, we want to split the coroutine as much as possible as early as 
possible to avoid any kind of violations of coroutine propertiers from 
optimization passes.

To split the coroutine early, this patch splits the coroutine right after 
parameter move during CoroEarly pass. Anything before remain in the original 
function (called init function), and the rest is put into a new function 
(called ramp function). It's done through 3 steps:

1. In CGCoroutine, we need to emit a few new intrinsic instructions that 
CoroEarly can use to correctly split the function. First of all, the parameter 
move should only happen once in the init function. To achieve this effect, a 
new intrinsic coro.init() is created that returns a boolean value. It will 
return true in the init function while false in the ramp function. This allows 
us to control the behavior difference between init and ramp. Secondly, we need 
a marker that tells CoroEarly pass that the init function part is done, and the 
rest belongs to the ramp function. This is achieved by a new intrinsic 
coro.init.end(). This essentially marks the splitting point in CoroEarly split. 
Finally, every alloca that's storing the parameter copies will be annotated 
with metadata, indicating that they are parameters and will be used in the ramp 
function. The same thing is done to the promise object. These should be the 
only allocas that need to be used across init and ramp function. Such metadata 
will allow us to 

[PATCH] D100282: [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass

2021-04-13 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 337259.
lxfind added a comment.

Update test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100282/new/

https://reviews.llvm.org/D100282

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-always-inline-resume.cpp
  clang/test/CodeGenCoroutines/coro-always-inline.cpp
  llvm/lib/Transforms/Coroutines/CoroEarly.cpp
  llvm/test/Transforms/Coroutines/coro-debug-O2.ll
  llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll
  llvm/test/Transforms/Coroutines/coro-split-01.ll
  llvm/test/Transforms/Coroutines/coro-split-recursive.ll
  llvm/test/Transforms/Coroutines/ex0.ll
  llvm/test/Transforms/Coroutines/ex1.ll
  llvm/test/Transforms/Coroutines/ex2.ll
  llvm/test/Transforms/Coroutines/ex3.ll
  llvm/test/Transforms/Coroutines/ex4.ll
  llvm/test/Transforms/Coroutines/ex5.ll
  llvm/test/Transforms/Coroutines/phi-coro-end.ll
  llvm/test/Transforms/Coroutines/restart-trigger.ll

Index: llvm/test/Transforms/Coroutines/restart-trigger.ll
===
--- llvm/test/Transforms/Coroutines/restart-trigger.ll
+++ llvm/test/Transforms/Coroutines/restart-trigger.ll
@@ -12,7 +12,7 @@
 ; CHECK:  CoroSplit: Processing coroutine 'f' state: 0
 ; CHECK-NEXT: CoroSplit: Processing coroutine 'f' state: 1
 
-define void @f() {
+define void @f() "coroutine.presplit"="0" {
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
   %alloc = call i8* @malloc(i32 %size)
Index: llvm/test/Transforms/Coroutines/phi-coro-end.ll
===
--- llvm/test/Transforms/Coroutines/phi-coro-end.ll
+++ llvm/test/Transforms/Coroutines/phi-coro-end.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex5.ll
===
--- llvm/test/Transforms/Coroutines/ex5.ll
+++ llvm/test/Transforms/Coroutines/ex5.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex4.ll
===
--- llvm/test/Transforms/Coroutines/ex4.ll
+++ llvm/test/Transforms/Coroutines/ex4.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %promise = alloca i32
   %pv = bitcast i32* %promise to i8*
Index: llvm/test/Transforms/Coroutines/ex3.ll
===
--- llvm/test/Transforms/Coroutines/ex3.ll
+++ llvm/test/Transforms/Coroutines/ex3.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex2.ll
===
--- llvm/test/Transforms/Coroutines/ex2.ll
+++ llvm/test/Transforms/Coroutines/ex2.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
Index: llvm/test/Transforms/Coroutines/ex1.ll
===
--- llvm/test/Transforms/Coroutines/ex1.ll
+++ llvm/test/Transforms/Coroutines/ex1.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* 

[PATCH] D100282: [Coroutines] Move CoroEarly pass to before AlwaysInliner

2021-04-12 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 336850.
lxfind added a comment.

Set the attributes in Clang instead of CoroEarly


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100282/new/

https://reviews.llvm.org/D100282

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-always-inline-resume.cpp
  clang/test/CodeGenCoroutines/coro-always-inline.cpp
  llvm/lib/Transforms/Coroutines/CoroEarly.cpp
  llvm/test/Transforms/Coroutines/coro-debug-O2.ll
  llvm/test/Transforms/Coroutines/coro-debug-frame-variable.ll
  llvm/test/Transforms/Coroutines/coro-split-01.ll
  llvm/test/Transforms/Coroutines/coro-split-recursive.ll
  llvm/test/Transforms/Coroutines/ex0.ll
  llvm/test/Transforms/Coroutines/ex1.ll
  llvm/test/Transforms/Coroutines/ex2.ll
  llvm/test/Transforms/Coroutines/ex3.ll
  llvm/test/Transforms/Coroutines/ex4.ll
  llvm/test/Transforms/Coroutines/ex5.ll
  llvm/test/Transforms/Coroutines/phi-coro-end.ll
  llvm/test/Transforms/Coroutines/restart-trigger.ll

Index: llvm/test/Transforms/Coroutines/restart-trigger.ll
===
--- llvm/test/Transforms/Coroutines/restart-trigger.ll
+++ llvm/test/Transforms/Coroutines/restart-trigger.ll
@@ -12,7 +12,7 @@
 ; CHECK:  CoroSplit: Processing coroutine 'f' state: 0
 ; CHECK-NEXT: CoroSplit: Processing coroutine 'f' state: 1
 
-define void @f() {
+define void @f() "coroutine.presplit"="0" {
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
   %alloc = call i8* @malloc(i32 %size)
Index: llvm/test/Transforms/Coroutines/phi-coro-end.ll
===
--- llvm/test/Transforms/Coroutines/phi-coro-end.ll
+++ llvm/test/Transforms/Coroutines/phi-coro-end.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex5.ll
===
--- llvm/test/Transforms/Coroutines/ex5.ll
+++ llvm/test/Transforms/Coroutines/ex5.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex4.ll
===
--- llvm/test/Transforms/Coroutines/ex4.ll
+++ llvm/test/Transforms/Coroutines/ex4.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %promise = alloca i32
   %pv = bitcast i32* %promise to i8*
Index: llvm/test/Transforms/Coroutines/ex3.ll
===
--- llvm/test/Transforms/Coroutines/ex3.ll
+++ llvm/test/Transforms/Coroutines/ex3.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %size = call i32 @llvm.coro.size.i32()
Index: llvm/test/Transforms/Coroutines/ex2.ll
===
--- llvm/test/Transforms/Coroutines/ex2.ll
+++ llvm/test/Transforms/Coroutines/ex2.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -S | FileCheck %s
 ; RUN: opt < %s -passes='default' -enable-coroutines -S | FileCheck %s
 
-define i8* @f(i32 %n) {
+define i8* @f(i32 %n) "coroutine.presplit"="0" {
 entry:
   %id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
   %need.dyn.alloc = call i1 @llvm.coro.alloc(token %id)
Index: llvm/test/Transforms/Coroutines/ex1.ll
===
--- llvm/test/Transforms/Coroutines/ex1.ll
+++ llvm/test/Transforms/Coroutines/ex1.ll
@@ -2,7 +2,7 @@
 ; RUN: opt < %s -O2 -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 ; RUN: opt < %s -aa-pipeline=basic-aa -passes='default' -enable-coroutines -preserve-alignment-assumptions-during-inlining=false -S | FileCheck %s
 

[PATCH] D100282: [Coroutines] Move CoroEarly pass to before AlwaysInliner

2021-04-11 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Ah, if the pass does more than just setting the attribute, then sure, it 
> makes sense to keep it.  But I do think we should be requiring the attribute 
> to be added by frontends, since it's really an IR invariant that it's present 
> on all unlowered coroutines.

By the way, it also sets these attributes for other types of coroutines (retcon 
and async). So the down-side would be then we need to do this for all 
front-ends (clang and swift).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100282/new/

https://reviews.llvm.org/D100282

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100282: [Coroutines] Move CoroEarly pass to before AlwaysInliner

2021-04-11 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D100282#2682251 , @rjmccall wrote:

> Why does this pass even exist?  We should just expect the frontend to set the 
> attribute.  It's not like frontends don't have to otherwise know that they're 
> emitting a coroutine; a ton of things about the expected entire IR pattern 
> are different.

The attribute setting can totally be moved to the front-end. 
One thing that's not clear to me is whether we should simply set coroutine 
functions noinline instead of replying on the attributres for this.
GCC seems to complain about inlining coroutines: 
https://godbolt.org/z/KrzE1znno, not fully sure why.

As for the CoroEarly pass, it lowers a bunch of intrinsics. Technically I think 
they can all be done in the front-end. But moving some complexity out of 
front-end to IR seems reasonable to me.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100282/new/

https://reviews.llvm.org/D100282

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D100282: [Coroutines] Move CoroEarly pass to before AlwaysInliner

2021-04-11 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
lxfind added reviewers: junparser, dongAxis1944, rjmccall, ChuanqiXu.
Herald added subscribers: hoy, modimo, wenlei, hiraditya.
lxfind requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

Presplit coroutines cannot be inlined. During AlwaysInliner we check if a 
function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute 
isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D100282

Files:
  clang/test/CodeGenCoroutines/coro-always-inline-resume.cpp
  clang/test/CodeGenCoroutines/coro-always-inline.cpp
  llvm/lib/Passes/PassBuilder.cpp

Index: llvm/lib/Passes/PassBuilder.cpp
===
--- llvm/lib/Passes/PassBuilder.cpp
+++ llvm/lib/Passes/PassBuilder.cpp
@@ -1883,6 +1883,12 @@
   for (auto  : PipelineEarlySimplificationEPCallbacks)
 C(MPM, Level);
 
+  // CoroEarlyPass needs to run before AlwaysInliner to make sure we add
+  // proper attributes to coroutines first, so that Inliner won't inline
+  // coroutines.
+  if (PTO.Coroutines)
+MPM.addPass(createModuleToFunctionPassAdaptor(CoroEarlyPass()));
+
   // Build a minimal pipeline based on the semantics required by LLVM,
   // which is just that always inlining occurs. Further, disable generating
   // lifetime intrinsics to avoid enabling further optimizations during
@@ -1940,8 +1946,6 @@
   }
 
   if (PTO.Coroutines) {
-MPM.addPass(createModuleToFunctionPassAdaptor(CoroEarlyPass()));
-
 CGSCCPassManager CGPM(DebugLogging);
 CGPM.addPass(CoroSplitPass());
 CGPM.addPass(createCGSCCToFunctionPassAdaptor(CoroElidePass()));
Index: clang/test/CodeGenCoroutines/coro-always-inline.cpp
===
--- clang/test/CodeGenCoroutines/coro-always-inline.cpp
+++ clang/test/CodeGenCoroutines/coro-always-inline.cpp
@@ -1,54 +1,64 @@
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -fcoroutines-ts \
-// RUN:   -fexperimental-new-pass-manager -O0 %s -o - | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -fcoroutines-ts \
-// RUN:   -fexperimental-new-pass-manager -fno-inline -O0 %s -o - | FileCheck %s
-
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -fcoroutines-ts \
-// RUN:   -O0 %s -o - | FileCheck %s
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -fcoroutines-ts \
-// RUN:   -fno-inline -O0 %s -o - | FileCheck %s
-
-namespace std {
-namespace experimental {
-
-struct handle {};
-
-struct awaitable {
-  bool await_ready() noexcept { return true; }
-  // CHECK-NOT: await_suspend
-  inline void __attribute__((__always_inline__)) await_suspend(handle) noexcept {}
-  bool await_resume() noexcept { return true; }
-};
+// RUN: %clang -std=c++2a %s -emit-llvm -S -o - | FileCheck %s
 
-template 
-struct coroutine_handle {
-  static handle from_address(void *address) noexcept { return {}; }
-};
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+class task {
+public:
+  class promise_type {
+  public:
+task get_return_object() noexcept;
+coro::suspend_always initial_suspend() noexcept;
+void return_void() noexcept;
+void unhandled_exception() noexcept;
+
+struct final_awaiter {
+  bool await_ready() noexcept;
+  void await_suspend(coro::coroutine_handle h) noexcept;
+  void await_resume() noexcept;
+};
 
-template 
-struct coroutine_traits {
-  struct promise_type {
-awaitable initial_suspend() { return {}; }
-awaitable final_suspend() noexcept { return {}; }
-void return_void() {}
-T get_return_object() { return T(); }
-void unhandled_exception() {}
+final_awaiter final_suspend() noexcept;
+
+coro::coroutine_handle<> continuation;
   };
+
+  task(task &) noexcept;
+  ~task();
+
+  class awaiter {
+  public:
+bool await_ready() noexcept;
+void await_suspend(coro::coroutine_handle<> continuation) noexcept;
+void await_resume() noexcept;
+
+  private:
+friend task;
+explicit awaiter(coro::coroutine_handle h) noexcept;
+coro::coroutine_handle coro_;
+  };
+
+  awaiter operator co_await() &
+
+private:
+  explicit task(coro::coroutine_handle h) noexcept;
+  coro::coroutine_handle coro_;
 };
-} // namespace experimental
-} // namespace std
-
-// CHECK-LABEL: @_Z3foov
-// CHECK-LABEL: entry:
-// CHECK-NEXT: %this.addr.i{{[0-9]*}} = alloca %"struct.std::experimental::awaitable"*, align 8
-// CHECK-NEXT: %this.addr.i{{[0-9]*}} = alloca %"struct.std::experimental::awaitable"*, align 8
-// CHECK: [[CAST0:%[0-9]+]] = bitcast %"struct.std::experimental::awaitable"** 

[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-25 Thread Xun Li via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGf490a5969bd5: [OpenMP][InstrProfiling] Fix a missing instr 
profiling counter (authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c


Index: clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
===
--- /dev/null
+++ clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -verify -fopenmp -x c -emit-llvm %s -triple 
x86_64-unknown-linux -o - -femit-all-decls -disable-llvm-passes 
-fprofile-instrument=clang | FileCheck %s
+// expected-no-diagnostics
+
+void sub(double *restrict a, double *restrict b, int n) {
+  int i;
+
+#pragma omp parallel for
+#pragma clang loop vectorize(disable)
+  for (i = 0; i < n; i++) {
+a[i] = a[i] + b[i];
+  }
+}
+
+// CHECK-LABEL: @.omp_outlined.(
+// CHECK-NEXT:  entry:
+// CHECK: call void @llvm.instrprof.increment(
+// CHECK:   omp.precond.then:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   cond.true:
+// CEHCK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   omp.inner.for.body:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1034,7 +1034,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1043,6 +1043,8 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  if (S)
+CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }


Index: clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
===
--- /dev/null
+++ clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -verify -fopenmp -x c -emit-llvm %s -triple x86_64-unknown-linux -o - -femit-all-decls -disable-llvm-passes -fprofile-instrument=clang | FileCheck %s
+// expected-no-diagnostics
+
+void sub(double *restrict a, double *restrict b, int n) {
+  int i;
+
+#pragma omp parallel for
+#pragma clang loop vectorize(disable)
+  for (i = 0; i < n; i++) {
+a[i] = a[i] + b[i];
+  }
+}
+
+// CHECK-LABEL: @.omp_outlined.(
+// CHECK-NEXT:  entry:
+// CHECK: call void @llvm.instrprof.increment(
+// CHECK:   omp.precond.then:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   cond.true:
+// CEHCK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   omp.inner.for.body:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1034,7 +1034,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1043,6 +1043,8 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  if (S)
+CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-25 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 333417.
lxfind added a comment.
This revision is now accepted and ready to land.

check null on S


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c


Index: clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
===
--- /dev/null
+++ clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -verify -fopenmp -x c -emit-llvm %s -triple 
x86_64-unknown-linux -o - -femit-all-decls -disable-llvm-passes 
-fprofile-instrument=clang | FileCheck %s
+// expected-no-diagnostics
+
+void sub(double *restrict a, double *restrict b, int n) {
+  int i;
+
+#pragma omp parallel for
+#pragma clang loop vectorize(disable)
+  for (i = 0; i < n; i++) {
+a[i] = a[i] + b[i];
+  }
+}
+
+// CHECK-LABEL: @.omp_outlined.(
+// CHECK-NEXT:  entry:
+// CHECK: call void @llvm.instrprof.increment(
+// CHECK:   omp.precond.then:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   cond.true:
+// CEHCK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   omp.inner.for.body:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1034,7 +1034,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1043,6 +1043,8 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  if (S)
+CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }


Index: clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
===
--- /dev/null
+++ clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -verify -fopenmp -x c -emit-llvm %s -triple x86_64-unknown-linux -o - -femit-all-decls -disable-llvm-passes -fprofile-instrument=clang | FileCheck %s
+// expected-no-diagnostics
+
+void sub(double *restrict a, double *restrict b, int n) {
+  int i;
+
+#pragma omp parallel for
+#pragma clang loop vectorize(disable)
+  for (i = 0; i < n; i++) {
+a[i] = a[i] + b[i];
+  }
+}
+
+// CHECK-LABEL: @.omp_outlined.(
+// CHECK-NEXT:  entry:
+// CHECK: call void @llvm.instrprof.increment(
+// CHECK:   omp.precond.then:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   cond.true:
+// CEHCK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   omp.inner.for.body:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1034,7 +1034,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1043,6 +1043,8 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  if (S)
+CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines

2021-03-25 Thread Xun Li via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGc7a39c833af1: [Coroutine][Clang] Force emit lifetime 
intrinsics for Coroutines (authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99227/new/

https://reviews.llvm.org/D99227

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-await-resume-eh.cpp
  clang/test/CodeGenCoroutines/coro-await.cpp
  clang/test/CodeGenCoroutines/coro-dest-slot.cpp
  clang/test/CodeGenCoroutines/coro-params.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
  clang/test/CodeGenCoroutines/coro-unhandled-exception.cpp

Index: clang/test/CodeGenCoroutines/coro-unhandled-exception.cpp
===
--- clang/test/CodeGenCoroutines/coro-unhandled-exception.cpp
+++ clang/test/CodeGenCoroutines/coro-unhandled-exception.cpp
@@ -50,6 +50,8 @@
 // CHECK: [[TRYCONT]]:
 // CHECK-NEXT: br label %[[COROFIN:.+]]
 // CHECK: [[COROFIN]]:
+// CHECK-NEXT: bitcast %"struct.std::experimental::coroutines_v1::suspend_never"* %{{.+}} to i8*
+// CHECK-NEXT: call void @llvm.lifetime.start.p0i8(
 // CHECK-NEXT: call void @"?final_suspend@promise_type@coro_t@@QEAA?AUsuspend_never@coroutines_v1@experimental@std@@XZ"(
 
 // CHECK-LPAD: @_Z1fv(
@@ -69,4 +71,6 @@
 // CHECK-LPAD: [[TRYCONT]]:
 // CHECK-LPAD: br label %[[COROFIN:.+]]
 // CHECK-LPAD: [[COROFIN]]:
+// CHECK-LPAD-NEXT: bitcast %"struct.std::experimental::coroutines_v1::suspend_never"* %{{.+}} to i8*
+// CHECK-LPAD-NEXT: call void @llvm.lifetime.start.p0i8(
 // CHECK-LPAD-NEXT: call void @_ZN6coro_t12promise_type13final_suspendEv(
Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
===
--- clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O0 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
 
 #include "Inputs/coroutine.h"
 
@@ -50,8 +50,13 @@
 
 // check that the lifetime of the coroutine handle used to obtain the address is contained within single basic block, and hence does not live across suspension points.
 // CHECK-LABEL: final.suspend:
-// CHECK: %[[PTR1:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP:.+]] to i8*
-// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])
-// CHECK: call i8* @{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* {{[^,]*}} %[[ADDR_TMP]])
-// CHECK-NEXT:%[[PTR2:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] to i8*
-// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])
+// CHECK: %{{.+}} = call token @llvm.coro.save(i8* null)
+// CHECK: %[[HDL_CAST1:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL:.+]] to i8*
+// CHECK: call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[HDL_CAST1]])
+// CHECK: %[[CALL:.+]] = call i8* @_ZN13detached_task12promise_type13final_awaiter13await_suspendENSt12experimental13coroutines_v116coroutine_handleIS0_EE(
+// CHECK: %[[HDL_CAST2:.+]] = getelementptr inbounds %"struct.std::experimental::coroutines_v1::coroutine_handle.0", %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL]], i32 0, i32 0
+// CHECK: store i8* %[[CALL]], i8** %[[HDL_CAST2]], align 8
+// CHECK: %[[HDL_TRANSFER:.+]] = call i8* @_ZNKSt12experimental13coroutines_v116coroutine_handleIvE7addressEv(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull dereferenceable(8) %[[HDL]])
+// CHECK: %[[HDL_CAST3:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL]] to i8*
+// CHECK: call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[HDL_CAST3]])
+// CHECK: call void @llvm.coro.resume(i8* %[[HDL_TRANSFER]])
Index: clang/test/CodeGenCoroutines/coro-params.cpp
===
--- clang/test/CodeGenCoroutines/coro-params.cpp
+++ clang/test/CodeGenCoroutines/coro-params.cpp
@@ -70,7 +70,11 @@
 
   // CHECK: call i8* @llvm.coro.begin(
   // CHECK: call void @_ZN8MoveOnlyC1EOS_(%struct.MoveOnly* {{[^,]*}} %[[MoCopy]], %struct.MoveOnly* nonnull align 4 dereferenceable(4) %[[MoParam]])
+  // CHECK-NEXT: bitcast %struct.MoveAndCopy* %[[McCopy]] to i8*
+  // CHECK-NEXT: call void 

[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-25 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D98135#2650940 , @ABataev wrote:

> In D98135#2650914 , @lxfind wrote:
>
>> @ABataev, wondering if you have a timeline on this?
>> Missing counters from OMP functions sometimes cause llvm-cov to crash 
>> because of data inconsistency.
>
> Cannot answer right now. It would be much easier to fix this if you could 
> provide additional info about what tests are exactly failed, what are the 
> constructs that do not support it, etc.

Yes the whole pipeline is a bit long and complex, so I don't have an exact 
repro in hand because it requires source code and run it.

But let me try to explain what happened in my observation. There are two 
sections that are related to this issue in the binary, the IPSK_covfun section 
that contains the function records, and the IPSK_name section that contains the 
list of all function names. The issue here is that some OMP functions that are 
found in the IPSK_covfun section are not found in the IPSK_name section.

The records in IPSK_covfun are generated like this:

Whenever CodeGenFunction is generating code for any function, it will first 
call the `CodeGenFunction::GenerateCode()` function, in which it will call 
`PGO.assignRegionCounters(GD, CurFn);`: 
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenFunction.cpp#L1329

From there, it will call `emitCounterRegionMapping`:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenPGO.cpp#L819

which will then call: `CGM.getCoverageMapping()->addFunctionMappingRecord`:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CodeGenPGO.cpp#L890

which will eventually add this function to a `FunctionRecords`:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/clang/lib/CodeGen/CoverageMappingGen.cpp#L1655

So all above is to show that every function will eventually be added to 
`FunctionRecords`, unless in the case where a function is explicitly marked as 
unused. The `FunctionRecords` will eventually be all written into the 
`IPSK_covfun` section in the binary.

The names in IPSK_name section are generated like this:
Within InstrProfiling.cpp 
(https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp),
 it collects all the function names referenced in all instrumentation counter 
increments instructions:
https://github.com/llvm/llvm-project/blob/e5f51fdd650c6d20c81fedb8e856e9858aa10991/llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp#L920
Basically for every InstrProfIncrementInst, it adds the function name 
referenced in it to a list `ReferencedNames`.
Then in the end this `ReferencedNames` is written into the IPSK_name section.

Now you can see that for the OMP functions  that don't have any counters, they 
are still added to `FunctionRecords`, but not added to `ReferencedNames`, 
because they are not referenced by any InstrProfIncrementInst.
During the running of llvm-cov, when reading the list of function records, it 
will attempt to look up the name of the function from the function name list:
https://github.com/llvm/llvm-project/blob/b9ff67099ad6da931976e66f1510c5af2558a86e/llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp#L560

And it will not be able to find it for the OMP case, so it will return an error.

Overall this is very complex and a bit fragile to me. For instance, we probably 
could have detected the error much earlier during Instrumentation pass in LLVM, 
that some function records' names are not in the name list. Or we could simply 
construct the list of function names based on the function records. But 
currently these two are generated independently.
cc @MaskRay and @vsk, maybe they have thoughts on this.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-25 Thread Xun Li via Phabricator via cfe-commits
lxfind abandoned this revision.
lxfind added a comment.

Abandoning in favor of D99227 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-25 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

@ABataev, wondering if you have a timeline on this?
Missing counters from OMP functions sometimes cause llvm-cov to crash because 
of data inconsistency.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines

2021-03-25 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 38.
lxfind added a comment.

Address comments, and fix all tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99227/new/

https://reviews.llvm.org/D99227

Files:
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/lib/CodeGen/CodeGenFunction.h
  clang/test/CodeGenCoroutines/coro-alloc.cpp
  clang/test/CodeGenCoroutines/coro-await-resume-eh.cpp
  clang/test/CodeGenCoroutines/coro-await.cpp
  clang/test/CodeGenCoroutines/coro-dest-slot.cpp
  clang/test/CodeGenCoroutines/coro-params.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
  clang/test/CodeGenCoroutines/coro-unhandled-exception.cpp

Index: clang/test/CodeGenCoroutines/coro-unhandled-exception.cpp
===
--- clang/test/CodeGenCoroutines/coro-unhandled-exception.cpp
+++ clang/test/CodeGenCoroutines/coro-unhandled-exception.cpp
@@ -50,6 +50,8 @@
 // CHECK: [[TRYCONT]]:
 // CHECK-NEXT: br label %[[COROFIN:.+]]
 // CHECK: [[COROFIN]]:
+// CHECK-NEXT: bitcast %"struct.std::experimental::coroutines_v1::suspend_never"* %{{.+}} to i8*
+// CHECK-NEXT: call void @llvm.lifetime.start.p0i8(
 // CHECK-NEXT: call void @"?final_suspend@promise_type@coro_t@@QEAA?AUsuspend_never@coroutines_v1@experimental@std@@XZ"(
 
 // CHECK-LPAD: @_Z1fv(
@@ -69,4 +71,6 @@
 // CHECK-LPAD: [[TRYCONT]]:
 // CHECK-LPAD: br label %[[COROFIN:.+]]
 // CHECK-LPAD: [[COROFIN]]:
+// CHECK-LPAD-NEXT: bitcast %"struct.std::experimental::coroutines_v1::suspend_never"* %{{.+}} to i8*
+// CHECK-LPAD-NEXT: call void @llvm.lifetime.start.p0i8(
 // CHECK-LPAD-NEXT: call void @_ZN6coro_t12promise_type13final_suspendEv(
Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
===
--- clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O0 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
 
 #include "Inputs/coroutine.h"
 
@@ -50,8 +50,13 @@
 
 // check that the lifetime of the coroutine handle used to obtain the address is contained within single basic block, and hence does not live across suspension points.
 // CHECK-LABEL: final.suspend:
-// CHECK: %[[PTR1:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP:.+]] to i8*
-// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])
-// CHECK: call i8* @{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* {{[^,]*}} %[[ADDR_TMP]])
-// CHECK-NEXT:%[[PTR2:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] to i8*
-// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])
+// CHECK: %{{.+}} = call token @llvm.coro.save(i8* null)
+// CHECK: %[[HDL_CAST1:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL:.+]] to i8*
+// CHECK: call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[HDL_CAST1]])
+// CHECK: %[[CALL:.+]] = call i8* @_ZN13detached_task12promise_type13final_awaiter13await_suspendENSt12experimental13coroutines_v116coroutine_handleIS0_EE(
+// CHECK: %[[HDL_CAST2:.+]] = getelementptr inbounds %"struct.std::experimental::coroutines_v1::coroutine_handle.0", %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL]], i32 0, i32 0
+// CHECK: store i8* %[[CALL]], i8** %[[HDL_CAST2]], align 8
+// CHECK: %[[HDL_TRANSFER:.+]] = call i8* @_ZNKSt12experimental13coroutines_v116coroutine_handleIvE7addressEv(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull dereferenceable(8) %[[HDL]])
+// CHECK: %[[HDL_CAST3:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL]] to i8*
+// CHECK: call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[HDL_CAST3]])
+// CHECK: call void @llvm.coro.resume(i8* %[[HDL_TRANSFER]])
Index: clang/test/CodeGenCoroutines/coro-params.cpp
===
--- clang/test/CodeGenCoroutines/coro-params.cpp
+++ clang/test/CodeGenCoroutines/coro-params.cpp
@@ -70,7 +70,11 @@
 
   // CHECK: call i8* @llvm.coro.begin(
   // CHECK: call void @_ZN8MoveOnlyC1EOS_(%struct.MoveOnly* {{[^,]*}} %[[MoCopy]], %struct.MoveOnly* nonnull align 4 dereferenceable(4) %[[MoParam]])
+  // CHECK-NEXT: bitcast %struct.MoveAndCopy* %[[McCopy]] to i8*
+  // CHECK-NEXT: call void @llvm.lifetime.start.p0i8(
   // CHECK-NEXT: call void @_ZN11MoveAndCopyC1EOS_(%struct.MoveAndCopy* {{[^,]*}} %[[McCopy]], %struct.MoveAndCopy* 

[PATCH] D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines

2021-03-24 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D99227#2646819 , @rjmccall wrote:

> Is it feasible to outline the initial segment that you don't want to be part 
> of the coroutine, and then have coroutine splitting force that outlined 
> function to be inlined into the ramp function?  IIUC, you were saying that 
> the splitting patch was difficult, but maybe thinking about it as outlining 
> simplifies things.  I know we had some nasty representational problems with 
> the async lowering that we solved with outlining and force-inlining.

That's a good idea. I will think about it. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99227/new/

https://reviews.llvm.org/D99227

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines

2021-03-23 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/lib/CodeGen/CGDecl.cpp:1318
 /// otherwise
 llvm::Value *CodeGenFunction::EmitLifetimeStart(uint64_t Size,
 llvm::Value *Addr) {

ChuanqiXu wrote:
> Can we sure frontend would always call this API to emit lifetime start? I 
> mean the frontend may call EmitIntrinsic or create lifetime.start intrinsic 
> directly whether by IRBuilder::CreateXXX or Instrinsic::Create(...). I worry 
> about if this would incur changes out of design.
> 
> Then if we add check in EmitLifetimeStart, why not we add check in 
> EmitLfietimeEnd?
I searched in the codebase, and we always call this API to emit lifetime start 
in the front-end.
Also, for coroutine to behave correctly, we really only need SD_FullExpression 
to be able to emit it. Other cases are less critical.

Usually when it emits a LifetimeStart instruction, it will store it somewhere, 
and latter check on it to decide whether it needs to emit a lifetime end. 
That's when there is no checks needed for lifetime end.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99227/new/

https://reviews.llvm.org/D99227

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines

2021-03-23 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D99227#2646568 , @ChuanqiXu wrote:

> Only one problem I had for emitting lifetime markers even at O0 is that would 
> action make allocas to be optimized even at O0? If so, I wonder if it 
> confuses programmers since they may find some variables disappear 
> surprisingly. Or there would be no optimization since every function would be 
> marked with optnone attribute. I am not sure about this.

It will only cause variables to be put on the stack instead of on the frame, 
which shouldn't affect developer's view?

> If I understand this problem correctly, this patch could fix problems for the 
> return value of symmetric transfer and the gro that we discussed in D98638 
> . Then D98638 
>  may be unneeded. I prefer the 
> implementation in this patch.

I doubt it can fix the gro problem. I will need to double check on that latter.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99227/new/

https://reviews.llvm.org/D99227

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines

2021-03-23 Thread Xun Li via Phabricator via cfe-commits
lxfind added a subscriber: lewissbaker.
lxfind added a comment.

> I think you just set `ShouldEmitLifetimeMarkers` correctly in the first place 
> instead of adding this as an extra condition to every place that considers 
> it, however.

This was set when a CodeGenFunction is constructed, at that point it doesn't 
yet know if this function is a coroutine.
I could turn ShouldEmitLifetimeMarkers to non-const, and then modify it once it 
realizes it's a coroutine though, if that's better than the current approach.

> Sorry, I re-read this after posting, and it's not exactly clear what I was 
> saying.  There are a lot of situations where Clang doesn't emit lifetime 
> intrinsics for every `alloca` it emits, or emits unnecessarily weak bounds.  
> Certain LLVM transforms can also introduce `alloca`s that don't have 
> corresponding lifetime intrinsics.  So I think it's problematic to consider 
> it a correctness condition that we're emitting optimally-tight lifetimes.

I tend to agree. Relying on lifetime for correctness seems fragile.
I wonder if there is a better way to inform optimizer that a "variable" is 
really a temporary value that should die at the end of an expression?
For instance, whenever we do something simple like:

  foo().bar();
  co_await ...

If we compile it under -O0 without lifetime intrinsics, the return value of 
`foo()` will always be put on the coroutine frame, unless the compiler knows in 
advance that `bar()` does not capture.
This becomes a problem if this code appears at a location where the current 
coroutine frame may be destroyed (but the code itself isn't wrong, it simply 
doesn't access the frame).
The case for symmetric transfer is exactly this situation.

An alternative to solve the problem for the case of symmetric transfer, is to 
change the design of symmetric transfer. For example, if we let `await_suspend` 
to return `void*` instead of `coroutine_handle`, we won't have this problem in 
the first place, because we no longer need to call `address()`. Maybe 
@lewissbaker can comment on the viability of that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99227/new/

https://reviews.llvm.org/D99227

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99227: [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines

2021-03-23 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
Herald added subscribers: ChuanqiXu, hoy, modimo, wenlei.
lxfind requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

tl;dr Correct implementation of Corouintes requires having lifetime intrinsics 
available.

Coroutine functions are functions that can be suspended and resumed latter. To 
do so, data that need to stay alive after suspension must be put on the heap 
(i.e. the coroutine frame).
The optimizer is responsible for analyzing each AllocaInst and figure out 
whether it should be put on the stack or the frame.
In most cases, for data that we are unable to accurately analyze lifetime, we 
can just conservatively put them on the heap.
Unfortunately, there exists a few cases where certain data MUST be put on the 
stack, not on the heap. Without lifetime intrinsics, we are unable to correctly 
analyze those data's lifetime.

To dig into more details, there exists cases where at certain code points, the 
current coroutine frame may have already been destroyed. Hence no frame access 
would be allowed beyond that point.
The following is a common code pattern called "Symmetric Transfer" in coroutine:

  auto tmp = await_suspend();
  __builtin_coro_resume(tmp.address());
  return;

In the above code example, `await_suspend()` returns a new coroutine handle, 
which we will obtain the address and then resume that coroutine. This 
essentially "transfered" from the current coroutine to a different coroutine.
During the call to `await_suspend()`, the current coroutine may be destroyed, 
which should be fine because we are not accessing any data afterwards.
However when LLVM is emitting IR for the above code, it needs to emit an 
AllocaInst for `tmp`. It will then call the `address` function on tmp. 
`address` function is a member function of coroutine, and there is no way for 
the LLVM optimizer to know that it does not capture the `tmp` pointer. So when 
the optimizer looks at it, it has to conservatively assume that `tmp` may 
escape and hence put it on the heap. Furthermore, in some cases `address` call 
would be inlined, which will generate a bunch of store/load instructions that 
move the `tmp` pointer around. Those stores will also make the compiler to 
think that `tmp` might escape.
To summarize, it's really difficult for the mid-end to figure out that the 
`tmp` data is short-lived.
I made some attempt in D98638 , but it appears 
to be way too complex and is basically doing the same thing as inserting 
lifetime intrinsics in coroutines.

Also, for reference, we already force emitting lifetime intrinsics in O0 for 
AlwaysInliner: 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D99227

Files:
  clang/lib/CodeGen/CGDecl.cpp
  clang/lib/CodeGen/CGExpr.cpp
  clang/lib/CodeGen/CodeGenFunction.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp


Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
===
--- clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 
-O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 
-O0 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
 
 #include "Inputs/coroutine.h"
 
@@ -50,8 +50,13 @@
 
 // check that the lifetime of the coroutine handle used to obtain the address 
is contained within single basic block, and hence does not live across 
suspension points.
 // CHECK-LABEL: final.suspend:
-// CHECK: %[[PTR1:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* 
%[[ADDR_TMP:.+]] to i8*
-// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])
-// CHECK: call i8* 
@{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 {{[^,]*}} %[[ADDR_TMP]])
-// CHECK-NEXT:%[[PTR2:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] 
to i8*
-// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])
+// CHECK: %{{.+}} = call token @llvm.coro.save(i8* null)
+// CHECK: %[[HDL_CAST1:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[HDL:.+]] to 
i8*
+// CHECK: call void @llvm.lifetime.start.p0i8(i64 8, i8* 
%[[HDL_CAST1]])
+// CHECK: %[[CALL:.+]] = call i8* 
@_ZN13detached_task12promise_type13final_awaiter13await_suspendENSt12experimental13coroutines_v116coroutine_handleIS0_EE(
+// CHECK: %[[HDL_CAST2:.+]] = getelementptr inbounds 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0", 

[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-21 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

@bruno  Thanks for the review!




Comment at: clang/lib/CodeGen/CGCoroutine.cpp:221
 CGF.EmitBlock(RealSuspendBlock);
+  } else if (ForcestackStart) {
+Builder.CreateCall(

bruno wrote:
> ChuanqiXu wrote:
> > lxfind wrote:
> > > ChuanqiXu wrote:
> > > > ChuanqiXu wrote:
> > > > > can we rewrite it into:
> > > > > ```
> > > > > else if (SuspendRet != nullptr && 
> > > > > SuspendRet->getType()->isClassType()) {
> > > > >  // generate:
> > > > >  // llvm.coro.forcestack(SuspendRet)
> > > > > }
> > > > > ```
> > > > Sorry I find we can't did it directly. As you said, we need to traverse 
> > > > down SuspendRet. And I still think we should did it only at CodeGen 
> > > > part since it looks not so hard. I guess we could make it in above 
> > > > 10~15 lines of codes.
> > > Traversing down AST isn't the hard part. The hard part is to search the 
> > > emitted IR, and look for the temporary alloca used to store the returned 
> > > handle.
> > Yes, I get your point. If we want to traverse the emitted IR, we could only 
> > search for the use-chain backward, which is also very odd. Let's see if 
> > there is other ways to modify the ASTNodes to make it more naturally.
> I'm curious whether did you consider annotating instructions with some new 
> custom metadata instead of using intrinsics? If so, what would be the 
> tradeoff? For example, if you could conditionally attach metadata some 
> "begin" metadata here:
> 
> `auto *SaveCall = Builder.CreateCall(CoroSave, {NullPtr});`
> 
> and "end" metadata here:
> 
> `auto *SuspendResult = Builder.CreateCall(CoroSuspend, {SaveCall, 
> Builder.getInt1(IsFinalSuspend)});`
The "end" part could probably be done through metadata. But I'm not sure how to 
do it for the "begin" part. The "begin" part needs to happen after the emission 
of S.getAwaitSuspendCallExpr().



Comment at: llvm/lib/Transforms/Coroutines/CoroFrame.cpp:2085
+if (auto *II = dyn_cast())
+  if (II->getIntrinsicID() == llvm::Intrinsic::coro_forcestack_begin) {
+assert(II->getNumUses() == 1 &&

bruno wrote:
> Do such intrinsics never get removed? What happens when this hits a backend?
They are added to the list of DeadInstructions after collected. So they will 
all be removed at the end of the pass.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-17 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

I am not sure how this would work, maybe I am missing something.
But this patch tries to round up the frame pointer by looking at the difference 
between the alignment of new and the alignment of the frame.
The alignment of new only gives you the guaranteed alignment for new, but not 
necessarily the maximum alignment, e.g. if the alignment of new is 16, the 
returned pointer can still be a multiple 32. And that difference matters.

Let's consider a frame that only has the two pointers and a promise with 
alignment requirement of 64. The alignment of new is 16.
Now you will calculate the difference to be 48, and create a padding of 48 
before the frame:
But if the returned pointer from new is actually a multiple of 32 (but not 64), 
the frame will no longer be aligned to 64 (but (32 + 48) % 64 = 16).
So from what I can tell, if we cannot pass alignment to new, we need to look at 
the address returned by new dynamically to decide the padding.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-17 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/lib/CodeGen/CGCoroutine.cpp:221
 CGF.EmitBlock(RealSuspendBlock);
+  } else if (ForcestackStart) {
+Builder.CreateCall(

ChuanqiXu wrote:
> ChuanqiXu wrote:
> > can we rewrite it into:
> > ```
> > else if (SuspendRet != nullptr && SuspendRet->getType()->isClassType()) {
> >  // generate:
> >  // llvm.coro.forcestack(SuspendRet)
> > }
> > ```
> Sorry I find we can't did it directly. As you said, we need to traverse down 
> SuspendRet. And I still think we should did it only at CodeGen part since it 
> looks not so hard. I guess we could make it in above 10~15 lines of codes.
Traversing down AST isn't the hard part. The hard part is to search the emitted 
IR, and look for the temporary alloca used to store the returned handle.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-17 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Can't we did as inline comments?

No, because it would have already been too late. SuspendExpr returns the result 
of __builtin_coro_resume(awaiter.await_suspend().address()), which is different 
from the result of awaiter.await_suspend().
We need to be able to control the placement of awaiter.await_suspend(), which 
is why I had to break up the AST at that boundary.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Then if we want to put the result of the await_suspend in the stack, I think 
> we can do it under CodeGen part only. It should be easy to judge the return 
> type of await_suspend and create a call to llvm.coro.forcestack to the return 
> value of await_suspend.

We probably could, but it would be very very tedious. 
During CodeGen, we only have the AST that's calling __builtin_coro_resume, 
which we will call Emit as a whole.
So we need to manually match the AST 2 levels down to find the await_suspend 
call, get its name, and then walk through the emitted IR to find a call with 
the same name, and then find the tmp that's used to store the return value of 
the call, and then emit llvm.coro.forcestack.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Then we need to answer the question: how can we **prove** that the result of 
> symmetric transfer and %gro are the **only** exceptions from the above rules. 
> Or how can we know the list of exceptions wouldn't get longer and longer in 
> the future?
>
> Then go back to the example in the summary. From my point of view, the key 
> problem is that our escape analysis isn't powerful enough. I don't ask us to 
> do excellent escape analysis. It may beyond our abilities. I just want to say 
> how can we know the result of symmetric transfer and %gro are the only 
> exceptions.

That's a fair point. I agree that we have no guarantee these are the only two 
cases.
It does seem to me that coroutine implementation somewhat relies on proper 
lifetime markers so that data are being put correctly, which may be the 
fundamental problem we are trying to solve.

> In D98638#2630778 , @lxfind wrote:
>
>> Whether or not the current coroutine frame would be destroyed completely 
>> depend on the implementation of await_suspend. So we cannot predict or know 
>> in advance. Therefore, the temporary handle returned by await_suspend must 
>> be put on the stack. I don't really see any other solutions other than this.
>
> OK. Although the main stream implementation of await_suspend only destroy the 
> coroutine handle in the final awaiter, the compiler can't assume the normal 
> await_suspend won't destroy it. So I agree to guard the result of the 
> await_suspend to make it put on the stack. At least, it would reduce the size 
> of the coroutine frame.
>
> Then if we want to put the result of the await_suspend in the stack, I think 
> we can do it under CodeGen part only. It should be easy to judge the return 
> type of await_suspend and create a call to llvm.coro.forcestack to the return 
> value of await_suspend.
>
> In D98638#2630778 , @lxfind wrote:
>
>> Well, I guess another potential solution is to force emitting lifetime 
>> intrinsics for this part of coroutine in the front-end.
>
> I am not sure if this is a good idea. May it break the guide principle in 
> LLVM? This need to be reviewed by others.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

Well, I guess another potential solution is to force emitting lifetime 
intrinsics for this part of coroutine in the front-end.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

> Here what I want to say is we **shouldn't**  handle all the symmetric 
> transfer from the above analysis. And we shouldn't change the ASTNodes and 
> Sema part. We need to solve about the above pattern. It is not easy to give a 
> solution since user could implement symmetric transfer in final awaiter 
> without destroying the handle, which is more common.

Just to clarify, in case there are any confusions around this. This patch would 
work no matter whether the coroutine frame is destroyed or not during 
await_suspend(). It simply makes sure that the temporary handle returned by 
await_suspend will be put in the stack instead of heap, and it will always be 
safe to do so, no matter what happens.
Whether or not the current coroutine frame would be destroyed completely depend 
on the implementation of await_suspend. So we cannot predict or know in 
advance. Therefore, the temporary handle returned by await_suspend must be put 
on the stack. I don't really see any other solutions other than this.

> It seems to be a workaround to use 
> @llvm.coro.forcestack(%result_of_final_await_suspend) . Since I wondering if 
> there are other corner cases as the %gro. My opinion about 
> '@llvm.coro.forcestack' is that we could use it as a patch if we find any 
> holes that is hard to handle immediately. But we also need to find a solution 
> to solve problems more fundamentally.

Yes as I mentioned in the description, there are really only two cases, one is 
after await_suspend call, and one is gro. gro is easy to handle and I will 
likely send a separate patch latter. But this problem with await_suspend is 
particularly challenging to solve.

What do you think is the fundamental problem, though?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/include/clang/AST/ExprCXX.h:4695
+/// afterwards on the stack.
 class CoroutineSuspendExpr : public Expr {
   friend class ASTStmtReader;

lxfind wrote:
> ChuanqiXu wrote:
> > It looks strange for the change of `CoroutineSuspendExpr` at the first 
> > glance. It is easy to understand the coroutine suspend expression is 
> > consists of three parts: Ready, Suspend and resume. It is written in the 
> > language documentation. And the new added AwaitSuspendCall is confusing.
> I agree. But this seems to be the only way to break up Suspend at the point 
> of await_suspend call so that we can insert instructions during CodeGen. Open 
> to ideas though.
One potential way to make this more clear is to rename these two nodes as: 
Suspend and Transfer.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D98638#2628082 , @ChuanqiXu wrote:

> I am a little confused about the first problem. Would it cause the program to 
> crash? (e.g., we access the fields of coroutine frame after the frame gets 
> destroyed). Or it just wastes some storage?

This is a repro of the crash (in TSAN mode): https://godbolt.org/z/KvPY66


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/include/clang/AST/ExprCXX.h:4695
+/// afterwards on the stack.
 class CoroutineSuspendExpr : public Expr {
   friend class ASTStmtReader;

ChuanqiXu wrote:
> It looks strange for the change of `CoroutineSuspendExpr` at the first 
> glance. It is easy to understand the coroutine suspend expression is consists 
> of three parts: Ready, Suspend and resume. It is written in the language 
> documentation. And the new added AwaitSuspendCall is confusing.
I agree. But this seems to be the only way to break up Suspend at the point of 
await_suspend call so that we can insert instructions during CodeGen. Open to 
ideas though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-16 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D98638#2628082 , @ChuanqiXu wrote:

> It looks like there are two things this patch wants to do:
>
> 1. Don't put the temporary generated by symmetric-transfer on the coroutine 
> frame.
> 2. Offer a mechanism to force some values (it is easy to extend Alloca to 
> Value) to put in the stack instead of the coroutine frame.
>
> I am a little confused about the first problem. Would it cause the program to 
> crash? (e.g., we access the fields of coroutine frame after the frame gets 
> destroyed). Or it just wastes some storage?
> And I want to ask about the change of the AST nodes and SemaCoroutine. Can we 
> know if a CoroutineSuspendExpr stands for a symmetric-transfer? If yes, it 
> seems we can only do changes in CodeGen part.

It will result in a crash, because we will be accessing memory that's already 
freed.  If you run:

  bin/clang -fcoroutines-ts -std=c++14 -stdlib=libc++ 
../clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp -o - -emit-llvm 
-S -Xclang -disable-llvm-passes

You can see that in the `final.suspend` basic block, there are IRs like this:

%call19 = call i8* 
@_ZN13detached_task12promise_type13final_awaiter13await_suspendENSt12experimental13coroutines_v116coroutine_handleIS0_EE(%"struct.detached_task::promise_type::final_awaiter"*
 nonnull dereferenceable(1) %ref.tm
  p10, i8* %22) #2
%coerce.dive20 = getelementptr inbounds 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0", 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %coerce, i32 0, 
i32 0
store i8* %call19, i8** %coerce.dive20, align 8
%call21 = call i8* 
@_ZNKSt12experimental13coroutines_v116coroutine_handleIvE7addressEv(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 nonnull dereferenceable(8) %coerce) #2
call void @llvm.coro.resume(i8* %call21)

The temporary variable %coerce will be put on the frame because it's used by 
the call to `address` function and LLVM thinks it may escape. But the call to 
await_suspend() (the first line) in reality could destroy the current coroutine 
frame. Hence after the call to await_suspend, it will be accessing the frame, 
leading to memory corruption.

> Then I agree to introduce new intrinsic to hint the middle end to put some 
> values on the stack. And the design of `@llvm.coro.forcestack.begin()` and 
> `@llvm.coro.forcestack.end()` is a little strange to me. It says they mark a 
> region where only data from the local stack can be accessed. But it looks 
> error-prone since it is hard for the front-end to decide whether all the 
> access of the region should be put on the stack. I think we could introduce 
> only one intrinisic `@llvm.coro.forcestack(Value* v)`, we can use the 
> argument to mark the value need to be put on the stack.

This is a good idea. Let me play with it. Thanks!

> And about the problem you mentioned in D96922 
> : "The lifetime of  %coro.gro" starts early 
> and %coro.gro" would be used after `coro.end` (Possibly the destructor?) 
> which would cause the program to access destroyed coroutine frame". It looks 
> like the mechanism could solve this problem by a call to 
> `@llvm.coro.forcestack(%coro.gro)`.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98638/new/

https://reviews.llvm.org/D98638

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98638: [RFC][Coroutine] Force stack allocation after await_suspend() call

2021-03-15 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
Herald added subscribers: ChuanqiXu, hoy, modimo, wenlei, hiraditya.
lxfind requested review of this revision.
Herald added subscribers: llvm-commits, cfe-commits, jdoerfert.
Herald added projects: clang, LLVM.

One of the challenges with the alloca analysis in CoroSplit is that in a few 
cases we need to make sure the allocas must be put on the stack, not on the 
frame.
One of the cases is symmetric transfer. Symmetric transfer is a newly 
introduced feature in C++ coroutines that allows for immediate transfer to a 
different coroutine when the current coroutine is suspended.
The await_suspend() call will return a coroutine handle type, and when that 
happens, the compiler should generate code to resume the returned handle. Like 
this:

  coroutine_handle tmp = awaiter.await_suspend();
  __builtin_coro_resume(tmp.address());

It's very common that after the call to await_suspend(), the current coroutine 
frame is already destroyed, which means we should not be accessing the 
coroutine frame from there.
And we shouldn't because we we use here is a temporary variable which will be 
short-lived. However in a debug build when we don't have lifetime intrinsics, 
it's very hard for 
the compiler to determine that tmp doesn't escape. There are two specific 
challenges here:

1. If the address() function call is not inlined (this should be the default 
case with -O0), we will have a function call that takes tmp as a pointer. The 
compiler does not know that the address call will not capture. This will lead 
to tmp being put on the frame. We could potentially special handle the address 
function in either front-end or CoroSplit, but both are fragile (we will need 
to do some name pattern matching).
2. If the address() function call is inlined (in some versions of libc++, 
address seems to have "always_inline" attribute), we will end up with a series 
of store/load instructions. For a naive analysis, a store of the pointer will 
also be treated as escape. To solve that problem, I introduced D91305 
, which tries to match this specific 
store/load pattern and be able to deal with it. It looks very hacky.

To solve this problem once for all, and provide a framework for solving similar 
problems in the future, this patch introduces 2 new intrinsics to mark a region 
where all data accessed must be put on the stack.
In the case of symmetric transfer, in order to be able to insert code during 
front-end codegen right after the await_suspend call, we need to split the 
Suspend subnode CoroutineSuspendExpr at await_suspend call, as the new 
AwaitSuspendCall subnode.
Then we create a OpaqueValueExpr to wrap around AwaitSuspendCall, and use it to 
continue build the rest of the Suspend subnode. OpaqueValueExpr is necessary 
because we don't want to emit the await_suspend call twice. OpaqueValueExpr 
serves as a stopper in codegen.
If there is no symmetric transfer, the new nodes will be nullptr.
After this patch, now right after the await_suspend() call, we will see a 
llvm.coro.forcestack.begin() intrinsic, and then right before coro.suspend(), 
we will see a llvm.coro.forcestack.end() intrinsic.
CoroSplit will then be able to use this information to decide whether some data 
must be put on the stack.
We are also able to remove the code that tries to match the special store/load 
instruction sequence.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D98638

Files:
  clang/include/clang/AST/ExprCXX.h
  clang/lib/CodeGen/CGCoroutine.cpp
  clang/lib/Sema/SemaCoroutine.cpp
  clang/lib/Serialization/ASTReaderStmt.cpp
  clang/lib/Serialization/ASTWriterStmt.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
  llvm/docs/Coroutines.rst
  llvm/include/llvm/IR/Intrinsics.td
  llvm/lib/Transforms/Coroutines/CoroFrame.cpp
  llvm/test/Transforms/Coroutines/coro-alloca-06.ll

Index: llvm/test/Transforms/Coroutines/coro-alloca-06.ll
===
--- llvm/test/Transforms/Coroutines/coro-alloca-06.ll
+++ llvm/test/Transforms/Coroutines/coro-alloca-06.ll
@@ -1,5 +1,5 @@
-; Test that in some simple cases allocas will not live on the frame even
-; though their pointers are stored.
+; Test that even though some stores may seem to escape pointers,
+; they can be put on the stack as long as they are within forcestack range.
 ; RUN: opt < %s -coro-split -S | FileCheck %s
 ; RUN: opt < %s -passes=coro-split -S | FileCheck %s
 
@@ -19,14 +19,12 @@
   %2 = call i8* @await_suspend()
   %3 = getelementptr inbounds %"handle", %"handle"* %0, i32 0, i32 0
   store i8* %2, i8** %3, align 8
-  %4 = bitcast %"handle"** %1 to i8*
-  call void @llvm.lifetime.start.p0i8(i64 8, i8* %4)
+  %4 = call i8* @llvm.coro.forcestack.begin()
   store %"handle"* %0, %"handle"** %1, align 8
   %5 = load %"handle"*, %"handle"** %1, align 8
   %6 = getelementptr 

[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-10 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D98135#2617448 , @ABataev wrote:

> In D98135#2617446 , @lxfind wrote:
>
>> In D98135#2617432 , @ABataev wrote:
>>
>>> There is a problem. We actually do not emit `S` here directly, instead, we 
>>> use `CodeGen` lambdas, which may not be equal to `S`, in some cases `S` is 
>>> `nullptr` here. It may result in not quite accurate results.
>>
>> Thanks for the note! 
>> I don't really know anything about OMP though, not sure how to handle it. 
>> Would you mind taking a look at this issue? Feel free to send a different 
>> patch!
>
> Could you create a PR for the problem so we could track it?

https://bugs.llvm.org/show_bug.cgi?id=49521


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-10 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D98135#2617432 , @ABataev wrote:

> There is a problem. We actually do not emit `S` here directly, instead, we 
> use `CodeGen` lambdas, which may not be equal to `S`, in some cases `S` is 
> `nullptr` here. It may result in not quite accurate results.

Thanks for the note! 
I don't really know anything about OMP though, not sure how to handle it. 
Would you mind taking a look at this issue? Feel free to send a different patch!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-10 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

Thanks. I am landing it. 
But feel free to comment here if anything isn't right. @ABataev


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-09 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 329517.
lxfind added a comment.

address comment


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c


Index: clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
===
--- /dev/null
+++ clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -verify -fopenmp -x c -emit-llvm %s -triple 
x86_64-unknown-linux -o - -femit-all-decls -disable-llvm-passes 
-fprofile-instrument=clang | FileCheck %s
+// expected-no-diagnostics
+
+void sub(double *restrict a, double *restrict b, int n) {
+  int i;
+
+#pragma omp parallel for
+#pragma clang loop vectorize(disable)
+  for (i = 0; i < n; i++) {
+a[i] = a[i] + b[i];
+  }
+}
+
+// CHECK-LABEL: @.omp_outlined.(
+// CHECK-NEXT:  entry:
+// CHECK: call void @llvm.instrprof.increment(
+// CHECK:   omp.precond.then:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   cond.true:
+// CEHCK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   omp.inner.for.body:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1028,7 +1028,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1037,6 +1037,7 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }


Index: clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
===
--- /dev/null
+++ clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -verify -fopenmp -x c -emit-llvm %s -triple x86_64-unknown-linux -o - -femit-all-decls -disable-llvm-passes -fprofile-instrument=clang | FileCheck %s
+// expected-no-diagnostics
+
+void sub(double *restrict a, double *restrict b, int n) {
+  int i;
+
+#pragma omp parallel for
+#pragma clang loop vectorize(disable)
+  for (i = 0; i < n; i++) {
+a[i] = a[i] + b[i];
+  }
+}
+
+// CHECK-LABEL: @.omp_outlined.(
+// CHECK-NEXT:  entry:
+// CHECK: call void @llvm.instrprof.increment(
+// CHECK:   omp.precond.then:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   cond.true:
+// CEHCK-NEXT:call void @llvm.instrprof.increment(
+// CHECK:   omp.inner.for.body:
+// CHECK-NEXT:call void @llvm.instrprof.increment(
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1028,7 +1028,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1037,6 +1037,7 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-06 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 328839.
lxfind added a comment.

add test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98135/new/

https://reviews.llvm.org/D98135

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp
  clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c


Index: clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
===
--- /dev/null
+++ clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -verify -fopenmp -x c -emit-llvm %s -triple 
x86_64-unknown-linux -o - -femit-all-decls -disable-llvm-passes 
-fprofile-instrument=clang | FileCheck %s
+// expected-no-diagnostics
+
+void sub(double *restrict a, double *restrict b, int n) {
+  int i;
+
+#pragma omp parallel for
+#pragma clang loop vectorize(disable)
+  for (i = 0; i < n; i++) {
+a[i] = a[i] + b[i];
+  }
+}
+
+// CHECK-LABEL: @.omp_outlined.(
+// CHECK-NEXT:  entry:
+// CHECK: call void @llvm.instrprof.increment
+// CHECK:   omp.precond.then:
+// CHECK-NEXT:call void @llvm.instrprof.increment
+// CHECK:   cond.true:
+// CEHCK-NEXT:call void @llvm.instrprof.increment
+// CHECK:   omp.inner.for.body:
+// CHECK-NEXT:call void @llvm.instrprof.increment
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1028,7 +1028,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1037,6 +1037,7 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }


Index: clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
===
--- /dev/null
+++ clang/test/OpenMP/omp_with_loop_pragma_instr_profile.c
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -verify -fopenmp -x c -emit-llvm %s -triple x86_64-unknown-linux -o - -femit-all-decls -disable-llvm-passes -fprofile-instrument=clang | FileCheck %s
+// expected-no-diagnostics
+
+void sub(double *restrict a, double *restrict b, int n) {
+  int i;
+
+#pragma omp parallel for
+#pragma clang loop vectorize(disable)
+  for (i = 0; i < n; i++) {
+a[i] = a[i] + b[i];
+  }
+}
+
+// CHECK-LABEL: @.omp_outlined.(
+// CHECK-NEXT:  entry:
+// CHECK: call void @llvm.instrprof.increment
+// CHECK:   omp.precond.then:
+// CHECK-NEXT:call void @llvm.instrprof.increment
+// CHECK:   cond.true:
+// CEHCK-NEXT:call void @llvm.instrprof.increment
+// CHECK:   omp.inner.for.body:
+// CHECK-NEXT:call void @llvm.instrprof.increment
Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1028,7 +1028,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1037,6 +1037,7 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D98135: [OpenMP][InstrProfiling] Fix a missing instr profiling counter

2021-03-06 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
lxfind added reviewers: ABataev, MaskRay.
Herald added subscribers: hoy, modimo, wenlei, guansong, yaxunl.
lxfind requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: cfe-commits, sstefan1.
Herald added a project: clang.

When emitting a function body there needs to be a instr profiling counter 
emitted. Otherwise instr profiling won't work for this function.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D98135

Files:
  clang/lib/CodeGen/CGOpenMPRuntime.cpp


Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1028,7 +1028,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1037,6 +1037,7 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }


Index: clang/lib/CodeGen/CGOpenMPRuntime.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -1028,7 +1028,7 @@
   getThreadIDVariable()->getType()->castAs());
 }
 
-void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt * /*S*/) {
+void CGOpenMPRegionInfo::EmitBody(CodeGenFunction , const Stmt *S) {
   if (!CGF.HaveInsertPoint())
 return;
   // 1.2.2 OpenMP Language Terminology
@@ -1037,6 +1037,7 @@
   // The point of exit cannot be a branch out of the structured block.
   // longjmp() and throw() must not violate the entry/exit criteria.
   CGF.EHStack.pushTerminate();
+  CGF.incrementProfileCounter(S);
   CodeGen(CGF);
   CGF.EHStack.popTerminate();
 }
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D97915: [Coroutines] Handle overaligned frame allocation

2021-03-04 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

Could you describe in more detail what problem this patch solves?
Also, since you are adding a new intrinsics, please also update the coroutine 
documentation regarding this new intrinsics.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97915/new/

https://reviews.llvm.org/D97915

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D92662: [Clang][Coroutine] Drop const attribute on pthread_self when coroutine is enabled

2020-12-10 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D92662#2446777 , @jyknight wrote:

> I don't think we should change the meaning of `__attribute__((const))` to 
> exclude depending on thread-id.
>
> However, if we do want to do so, and call the existing uses of 
> `__attribute__((const))` in glibc invalid, we need to special case many more 
> functions. Looking through it a little, I see `__errno_location`, 
> `__rpc_thread_variables`, `__ctype_b_loc`, `__ctype_tolower_loc`, 
> `__ctype_toupper_loc`, `__libc_tsd_address`...and I gave up looking after 
> that.

Thanks for pointing it out. I didn't realize there are so many of them.
Your proposals in the llvm-dev thread sound very promising. Let me think them 
over.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92662/new/

https://reviews.llvm.org/D92662

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D92662: [Clang][Coroutine] Drop const attribute on pthread_self when coroutine is enabled

2020-12-09 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D92662#2443970 , @MaskRay wrote:

> If the attribute is not suitable, glibc should drop it. The compiler can add 
> `readnone`/`readonly` if appropriate.

It's a C library interface though, and Coroutine is likely too new for this. I 
think that in the long run when Coroutine is pervasive we probably should drop 
it, but for now it's likely going to be hard for the glibc community to 
consider dropping it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92662/new/

https://reviews.llvm.org/D92662

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D92662: [Clang][Coroutine] Drop const attribute on pthread_self when coroutine is enabled

2020-12-09 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 310599.
lxfind added a comment.

Fix test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92662/new/

https://reviews.llvm.org/D92662

Files:
  clang/lib/Sema/SemaDecl.cpp
  clang/test/CodeGenCoroutines/coro-pthread_self.cpp


Index: clang/test/CodeGenCoroutines/coro-pthread_self.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-pthread_self.cpp
@@ -0,0 +1,56 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 
-O3 -emit-llvm %s -o - | FileCheck %s
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+typedef void *pthread_t;
+pthread_t pthread_self(void) __attribute__((__const__));
+
+struct awaitable {
+  bool await_ready() { return false; }
+  void await_suspend(coro::coroutine_handle<> h);
+  void await_resume() {}
+};
+awaitable switch_to_new_thread();
+
+struct task {
+  struct promise_type {
+task get_return_object() { return {}; }
+coro::suspend_never initial_suspend() { return {}; }
+coro::suspend_never final_suspend() noexcept { return {}; }
+void return_void() {}
+void unhandled_exception() {}
+  };
+};
+
+void check(pthread_t p1, pthread_t p2);
+
+task resuming_on_new_thread() {
+  auto pthread1 = pthread_self();
+  co_await switch_to_new_thread();
+  auto pthread2 = pthread_self();
+  check(pthread1, pthread2);
+}
+
+void non_coroutine() {
+  auto pthread1 = pthread_self();
+  check(pthread1, pthread1);
+  auto pthread2 = pthread_self();
+  check(pthread1, pthread2);
+}
+
+// CHECK-LABEL: define void @_Z13non_coroutinev()
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:%call = tail call i8* @_Z12pthread_selfv()
+// CHECK-NEXT:tail call void @_Z5checkPvS_(i8* %call, i8* %call)
+// CHECK-NEXT:%call1 = tail call i8* @_Z12pthread_selfv()
+// CHECK-NEXT:tail call void @_Z5checkPvS_(i8* %call, i8* %call1)
+// CHECK-NEXT:ret void
+// CHECK-NEXT:  }
+
+// CHECK-LABEL: define internal fastcc void @_Z22resuming_on_new_threadv.resume
+// CHECK: %[[RELOAD_ADDR:.+.reload.addr]] = getelementptr inbounds 
%_Z22resuming_on_new_threadv.Frame, %_Z22resuming_on_new_threadv.Frame* 
%FramePtr, i64 0, i32 {{.+}}
+// CHECK: %[[RELOAD:.+]] = load i8*, i8** %[[RELOAD_ADDR]], align 8
+// CHECK: %[[CALL:.+]] = tail call i8* @_Z12pthread_selfv()
+// CHECK: tail call void @_Z5checkPvS_(i8* %[[RELOAD]], i8* %[[CALL]])
Index: clang/lib/Sema/SemaDecl.cpp
===
--- clang/lib/Sema/SemaDecl.cpp
+++ clang/lib/Sema/SemaDecl.cpp
@@ -14962,6 +14962,12 @@
   IdentifierInfo *Name = FD->getIdentifier();
   if (!Name)
 return;
+
+  if (getLangOpts().Coroutines && Name->isStr("pthread_self") &&
+  FD->hasAttr()) {
+FD->dropAttr();
+  }
+
   if ((!getLangOpts().CPlusPlus &&
FD->getDeclContext()->isTranslationUnit()) ||
   (isa(FD->getDeclContext()) &&


Index: clang/test/CodeGenCoroutines/coro-pthread_self.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-pthread_self.cpp
@@ -0,0 +1,56 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O3 -emit-llvm %s -o - | FileCheck %s
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+typedef void *pthread_t;
+pthread_t pthread_self(void) __attribute__((__const__));
+
+struct awaitable {
+  bool await_ready() { return false; }
+  void await_suspend(coro::coroutine_handle<> h);
+  void await_resume() {}
+};
+awaitable switch_to_new_thread();
+
+struct task {
+  struct promise_type {
+task get_return_object() { return {}; }
+coro::suspend_never initial_suspend() { return {}; }
+coro::suspend_never final_suspend() noexcept { return {}; }
+void return_void() {}
+void unhandled_exception() {}
+  };
+};
+
+void check(pthread_t p1, pthread_t p2);
+
+task resuming_on_new_thread() {
+  auto pthread1 = pthread_self();
+  co_await switch_to_new_thread();
+  auto pthread2 = pthread_self();
+  check(pthread1, pthread2);
+}
+
+void non_coroutine() {
+  auto pthread1 = pthread_self();
+  check(pthread1, pthread1);
+  auto pthread2 = pthread_self();
+  check(pthread1, pthread2);
+}
+
+// CHECK-LABEL: define void @_Z13non_coroutinev()
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:%call = tail call i8* @_Z12pthread_selfv()
+// CHECK-NEXT:tail call void @_Z5checkPvS_(i8* %call, i8* %call)
+// CHECK-NEXT:%call1 = tail call i8* @_Z12pthread_selfv()
+// CHECK-NEXT:tail call void @_Z5checkPvS_(i8* %call, i8* %call1)
+// CHECK-NEXT:ret void
+// CHECK-NEXT:  }
+
+// CHECK-LABEL: define internal fastcc void @_Z22resuming_on_new_threadv.resume
+// CHECK: %[[RELOAD_ADDR:.+.reload.addr]] = getelementptr inbounds %_Z22resuming_on_new_threadv.Frame, 

[PATCH] D92661: [RFC] Fix TLS and Coroutine

2020-12-09 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 310575.
lxfind added a comment.
Herald added subscribers: nikic, kerbowa, jvesely.

Fix all failing tests


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92661/new/

https://reviews.llvm.org/D92661

Files:
  clang/lib/CodeGen/CGExpr.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGen/lto-newpm-pipeline.c
  clang/test/CodeGenCXX/cxx2a-thread-local-constinit.cpp
  clang/test/CodeGenCoroutines/coro-tls.cpp
  llvm/include/llvm/IR/IRBuilder.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/InitializePasses.h
  llvm/include/llvm/Transforms/Scalar.h
  llvm/include/llvm/Transforms/Scalar/LowerThreadLocalIntrinsic.h
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
  llvm/lib/Transforms/Scalar/CMakeLists.txt
  llvm/lib/Transforms/Scalar/LowerThreadLocalIntrinsic.cpp
  llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
  llvm/test/Other/new-pass-manager.ll
  llvm/test/Other/new-pm-O0-defaults.ll
  llvm/test/Other/new-pm-defaults.ll
  llvm/test/Other/opt-O2-pipeline.ll
  llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
  llvm/test/Other/opt-O3-pipeline.ll
  llvm/test/Other/opt-Os-pipeline.ll
  llvm/test/Other/pass-pipelines.ll

Index: llvm/test/Other/pass-pipelines.ll
===
--- llvm/test/Other/pass-pipelines.ll
+++ llvm/test/Other/pass-pipelines.ll
@@ -72,6 +72,7 @@
 ; Next we break out of the main Function passes inside the CGSCC pipeline with
 ; a barrier pass.
 ; CHECK-O2: A No-Op Barrier Pass
+; CHECK-O2-NEXT: Lower ThreadLocal Intrinsics
 ; CHECK-O2-NEXT: Eliminate Available Externally
 ; Inferring function attribute should be right after the CGSCC pipeline, before
 ; any other optimizations/analyses.
Index: llvm/test/Other/opt-Os-pipeline.ll
===
--- llvm/test/Other/opt-Os-pipeline.ll
+++ llvm/test/Other/opt-Os-pipeline.ll
@@ -173,6 +173,7 @@
 ; CHECK-NEXT: Optimization Remark Emitter
 ; CHECK-NEXT: Combine redundant instructions
 ; CHECK-NEXT: A No-Op Barrier Pass
+; CHECK-NEXT: Lower ThreadLocal Intrinsics
 ; CHECK-NEXT: Eliminate Available Externally Globals
 ; CHECK-NEXT: CallGraph Construction
 ; CHECK-NEXT: Deduce function attributes in RPO
Index: llvm/test/Other/opt-O3-pipeline.ll
===
--- llvm/test/Other/opt-O3-pipeline.ll
+++ llvm/test/Other/opt-O3-pipeline.ll
@@ -192,6 +192,7 @@
 ; CHECK-NEXT: Optimization Remark Emitter
 ; CHECK-NEXT: Combine redundant instructions
 ; CHECK-NEXT: A No-Op Barrier Pass
+; CHECK-NEXT: Lower ThreadLocal Intrinsics
 ; CHECK-NEXT: Eliminate Available Externally Globals
 ; CHECK-NEXT: CallGraph Construction
 ; CHECK-NEXT: Deduce function attributes in RPO
Index: llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
===
--- llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
+++ llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
@@ -192,6 +192,7 @@
 ; CHECK-NEXT: Optimization Remark Emitter
 ; CHECK-NEXT: Combine redundant instructions
 ; CHECK-NEXT: A No-Op Barrier Pass
+; CHECK-NEXT: Lower ThreadLocal Intrinsics
 ; CHECK-NEXT: Eliminate Available Externally Globals
 ; CHECK-NEXT: CallGraph Construction
 ; CHECK-NEXT: Deduce function attributes in RPO
Index: llvm/test/Other/opt-O2-pipeline.ll
===
--- llvm/test/Other/opt-O2-pipeline.ll
+++ llvm/test/Other/opt-O2-pipeline.ll
@@ -187,6 +187,7 @@
 ; CHECK-NEXT: Optimization Remark Emitter
 ; CHECK-NEXT: Combine redundant instructions
 ; CHECK-NEXT: A No-Op Barrier Pass
+; CHECK-NEXT: Lower ThreadLocal Intrinsics
 ; CHECK-NEXT: Eliminate Available Externally Globals
 ; CHECK-NEXT: CallGraph Construction
 ; CHECK-NEXT: Deduce function attributes in RPO
Index: llvm/test/Other/new-pm-defaults.ll
===
--- llvm/test/Other/new-pm-defaults.ll
+++ llvm/test/Other/new-pm-defaults.ll
@@ -209,6 +209,7 @@
 ; CHECK-EP-CGSCC-LATE-NEXT: Running pass: NoOpCGSCCPass
 ; CHECK-O-NEXT: Finished CGSCC pass manager run.
 ; CHECK-O-NEXT: Finished llvm::Module pass manager run.
+; CHECK-O-NEXT: Running pass: LowerThreadLocalIntrinsicPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
 ; CHECK-O-NEXT: Running pass: GlobalDCEPass
 ; CHECK-DEFAULT-NEXT: Running pass: EliminateAvailableExternallyPass
Index: llvm/test/Other/new-pm-O0-defaults.ll
===
--- llvm/test/Other/new-pm-O0-defaults.ll
+++ llvm/test/Other/new-pm-O0-defaults.ll
@@ -32,6 +32,7 @@
 ; CHECK-DEFAULT-NEXT: Running analysis: ProfileSummaryAnalysis
 ; CHECK-MATRIX-NEXT: 

[PATCH] D92661: [RFC] Fix TLS and Coroutine

2020-12-08 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: llvm/include/llvm/IR/Intrinsics.td:1309
+// Intrinsic to obtain the address of a thread_local variable.
+def int_threadlocal : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty]>;
+

hoy wrote:
> lxfind wrote:
> > hoy wrote:
> > > lxfind wrote:
> > > > hoy wrote:
> > > > > hoy wrote:
> > > > > > With the intrinsic, can TLS variable reference in the same 
> > > > > > coroutine or regular routine be DCE-ed anymore?
> > > > > Sorry, I meant CSE-ed.
> > > > Since the intrinsics does not have readnone attribute, it won't be 
> > > > CSE-ed before CoroSplit.
> > > > However after CoroSplit, it will be lowered back to the direct 
> > > > reference of the TLS, and will be CSE-ed by latter passes.
> > > > I can add a test function to demonstrate that too.
> > > Sounds good. Can you please point out what optimization passes CSE-ed tls 
> > > reference without this implementation? I'm wondering if those 
> > > optimizations can be postponed to after CoroSplit. 
> > To clarify, it wasn't just CSE that would merge the references of the same 
> > TLS.
> > For instance, without this patch, a reference to "tls_variable" will just 
> > be "@tls_variable". For code like this:
> > 
> >   @tls_variable = internal thread_local global i32 0, align 4
> > 
> >   define i32* @foo(){
> > ret i32* @tls_variable
> >   }
> >   
> >   define void @bar() {
> > %tls1 = call i32* @foo()
> > ..coro.suspend..
> > %tls2 = call i32* @foo()
> > %cond = icmp eq i32* %tls1, %tls2
> >   }
> > 
> > When foo() is inlined into bar(), all uses of %tls1 will be replaced with 
> > @tls_variable.
> Thanks for the explanation. I have a dumb question. Why isn't corosplit 
> placed at the very beginning of the pipeline?
The coroutine frame size is determined during CoroSplit. So if CoroSplit 
happens too early without any optimizations, the frame size will always be very 
big and there is no chance to optimize it.
This is indeed a fundamental trade-off. If CoroSplit happens much earlier then 
it will be immune to this kind of problem.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92661/new/

https://reviews.llvm.org/D92661

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D92661: [RFC] Fix TLS and Coroutine

2020-12-04 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: llvm/include/llvm/IR/Intrinsics.td:1309
+// Intrinsic to obtain the address of a thread_local variable.
+def int_threadlocal : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty]>;
+

hoy wrote:
> lxfind wrote:
> > hoy wrote:
> > > hoy wrote:
> > > > With the intrinsic, can TLS variable reference in the same coroutine or 
> > > > regular routine be DCE-ed anymore?
> > > Sorry, I meant CSE-ed.
> > Since the intrinsics does not have readnone attribute, it won't be CSE-ed 
> > before CoroSplit.
> > However after CoroSplit, it will be lowered back to the direct reference of 
> > the TLS, and will be CSE-ed by latter passes.
> > I can add a test function to demonstrate that too.
> Sounds good. Can you please point out what optimization passes CSE-ed tls 
> reference without this implementation? I'm wondering if those optimizations 
> can be postponed to after CoroSplit. 
To clarify, it wasn't just CSE that would merge the references of the same TLS.
For instance, without this patch, a reference to "tls_variable" will just be 
"@tls_variable". For code like this:

  @tls_variable = internal thread_local global i32 0, align 4

  define i32* @foo(){
ret i32* @tls_variable
  }
  
  define void @bar() {
%tls1 = call i32* @foo()
..coro.suspend..
%tls2 = call i32* @foo()
%cond = icmp eq i32* %tls1, %tls2
  }

When foo() is inlined into bar(), all uses of %tls1 will be replaced with 
@tls_variable.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92661/new/

https://reviews.llvm.org/D92661

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D92661: [RFC] Fix TLS and Coroutine

2020-12-04 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: llvm/include/llvm/IR/Intrinsics.td:1309
+// Intrinsic to obtain the address of a thread_local variable.
+def int_threadlocal : Intrinsic<[llvm_ptr_ty], [llvm_ptr_ty]>;
+

hoy wrote:
> hoy wrote:
> > With the intrinsic, can TLS variable reference in the same coroutine or 
> > regular routine be DCE-ed anymore?
> Sorry, I meant CSE-ed.
Since the intrinsics does not have readnone attribute, it won't be CSE-ed 
before CoroSplit.
However after CoroSplit, it will be lowered back to the direct reference of the 
TLS, and will be CSE-ed by latter passes.
I can add a test function to demonstrate that too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92661/new/

https://reviews.llvm.org/D92661

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D92662: [Clang][Coroutine] Drop const attribute on pthread_self when coroutine is enabled

2020-12-04 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
Herald added subscribers: hoy, modimo, wenlei, modocache.
lxfind requested review of this revision.
Herald added a project: clang.
Herald added a subscriber: cfe-commits.

This patch is to address https://bugs.llvm.org/show_bug.cgi?id=47833
A relevant discussion can also be found in 
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146766.html

pthread_self() from glibc is defined with "__attribute__
((__const__))". The const attribute tells the compiler that it does
not read nor write any global state and hence always return the same
result. Hence in the following code:

auto x1 = pthread_self();
...
auto x2 = pthread_self();

the second call to pthread_self() can be optimized out. This has been
correct until coroutines. With coroutines, we can have code like this:

auto x1 = pthread_self();
co_await ...
auto x2 = pthread_self();

Now because of the co_await, the function can suspend and resume in a
different thread, in which case the second call to pthread_self()
should return a different result than the first one. Unfortunately
LLVM will still optimize out the second call in the case of
coroutines.

To fix the issue, this patch drops the readnone attribute from the pthread_self 
function in Clang.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D92662

Files:
  clang/lib/Sema/SemaDecl.cpp
  clang/test/CodeGenCoroutines/coro-pthread_self.cpp


Index: clang/test/CodeGenCoroutines/coro-pthread_self.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-pthread_self.cpp
@@ -0,0 +1,58 @@
+// RUN: %clang -fcoroutines-ts -std=c++14 -O3 -emit-llvm -S %s -o - | 
FileCheck %s
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+typedef void *pthread_t;
+pthread_t pthread_self(void) __attribute__((__const__));
+
+struct awaitable {
+  bool await_ready() { return false; }
+  void await_suspend(coro::coroutine_handle<> h);
+  void await_resume() {}
+};
+awaitable switch_to_new_thread();
+
+struct task {
+  struct promise_type {
+task get_return_object() { return {}; }
+coro::suspend_never initial_suspend() { return {}; }
+coro::suspend_never final_suspend() noexcept { return {}; }
+void return_void() {}
+void unhandled_exception() {}
+  };
+};
+
+void check(pthread_t p1, pthread_t p2);
+
+task resuming_on_new_thread() {
+  auto pthread1 = pthread_self();
+  co_await switch_to_new_thread();
+  auto pthread2 = pthread_self();
+  check(pthread1, pthread2);
+}
+
+void non_coroutine() {
+  auto pthread1 = pthread_self();
+  check(pthread1, pthread1);
+  auto pthread2 = pthread_self();
+  check(pthread1, pthread2);
+}
+
+// CHECK-LABEL: define void @_Z13non_coroutinev()
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:%call = tail call i8* @_Z12pthread_selfv()
+// CHECK-NEXT:tail call void @_Z5checkPvS_(i8* %call, i8* %call)
+// CHECK-NEXT:%call1 = tail call i8* @_Z12pthread_selfv()
+// CHECK-NEXT:tail call void @_Z5checkPvS_(i8* %call, i8* %call1)
+// CHECK-NEXT:ret void
+// CHECK-NEXT:  }
+
+// CHECK-LABEL: define internal fastcc void @_Z22resuming_on_new_threadv.resume
+// CHECK: %[[CALL:.+]] = invoke i8* @_Z12pthread_selfv()
+// CHECK-NEXT:to label %[[CONT:.+]] unwind label %{{.+}}
+// CHECK:  [[CONT]]:
+// CHECK-NEXT:%[[RELOAD_ADDR:.+]] = getelementptr inbounds 
%_Z22resuming_on_new_threadv.Frame, %_Z22resuming_on_new_threadv.Frame* 
%FramePtr, i64 0, i32 {{.+}}
+// CHECK-NEXT:%[[RELOAD:.+]] = load i8*, i8** %[[RELOAD_ADDR]], align 8
+// CHECK-NEXT:invoke void @_Z5checkPvS_(i8* %[[RELOAD]], i8* %[[CALL]])
Index: clang/lib/Sema/SemaDecl.cpp
===
--- clang/lib/Sema/SemaDecl.cpp
+++ clang/lib/Sema/SemaDecl.cpp
@@ -14938,6 +14938,12 @@
   IdentifierInfo *Name = FD->getIdentifier();
   if (!Name)
 return;
+
+  if (getLangOpts().Coroutines && Name->isStr("pthread_self") &&
+  FD->hasAttr()) {
+FD->dropAttr();
+  }
+
   if ((!getLangOpts().CPlusPlus &&
FD->getDeclContext()->isTranslationUnit()) ||
   (isa(FD->getDeclContext()) &&


Index: clang/test/CodeGenCoroutines/coro-pthread_self.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-pthread_self.cpp
@@ -0,0 +1,58 @@
+// RUN: %clang -fcoroutines-ts -std=c++14 -O3 -emit-llvm -S %s -o - | FileCheck %s
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+typedef void *pthread_t;
+pthread_t pthread_self(void) __attribute__((__const__));
+
+struct awaitable {
+  bool await_ready() { return false; }
+  void await_suspend(coro::coroutine_handle<> h);
+  void await_resume() {}
+};
+awaitable switch_to_new_thread();
+
+struct task {
+  struct promise_type {
+task get_return_object() { return {}; }
+coro::suspend_never initial_suspend() { return {}; }
+

[PATCH] D92661: [RFC] Fix TLS and Coroutine

2020-12-04 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
Herald added subscribers: hoy, modimo, wenlei, steven_wu, modocache, hiraditya, 
mgorny.
lxfind requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added subscribers: llvm-commits, cfe-commits, sstefan1, jdoerfert.
Herald added projects: clang, LLVM.

This patch is to address https://bugs.llvm.org/show_bug.cgi?id=47835.
A relevant discussion regarding pthread_self and TLS can be found here: 
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146766.html.

A coroutine may suspend and resume on a different thread, and hence the address 
of a thread_local variable may change after coroutine suspension.
In the existing design, getting the address of a TLS variable is through a 
direct reference, like @tls_variable. Such kind of value can be
arbitrarily moved around/replaced in the IR within the same function. This will 
lead to incorrect caching of TLS variable address in coroutines across 
suspension points.
To fix it, we have to turn the TLS address access into an intrinsics call, so 
that it will not be simply CSE-ed.
After CoroSplit, we no longer have coroutines, and hence can safely lower the 
TLS intrinsics back into references.

Note:
The current placement of the LowerThreadLocalIntrinsicPass may not be ideal. I 
am not quite sure how to organize it. Suggestions welcome!
Testing isn't sufficient, and there may also be failing tests. I will add/fix 
more tests if this patch is along the right direction.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D92661

Files:
  clang/lib/CodeGen/CGExpr.cpp
  clang/lib/CodeGen/ItaniumCXXABI.cpp
  clang/test/CodeGen/lto-newpm-pipeline.c
  clang/test/CodeGenCXX/cxx2a-thread-local-constinit.cpp
  clang/test/CodeGenCoroutines/coro-tls.cpp
  llvm/include/llvm/IR/IRBuilder.h
  llvm/include/llvm/IR/Intrinsics.td
  llvm/include/llvm/InitializePasses.h
  llvm/include/llvm/Transforms/Scalar.h
  llvm/include/llvm/Transforms/Scalar/LowerThreadLocalIntrinsic.h
  llvm/lib/IR/IRBuilder.cpp
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
  llvm/lib/Transforms/Scalar/CMakeLists.txt
  llvm/lib/Transforms/Scalar/LowerThreadLocalIntrinsic.cpp
  llvm/test/Other/new-pass-manager.ll
  llvm/test/Other/new-pm-O0-defaults.ll
  llvm/test/Other/new-pm-defaults.ll

Index: llvm/test/Other/new-pm-defaults.ll
===
--- llvm/test/Other/new-pm-defaults.ll
+++ llvm/test/Other/new-pm-defaults.ll
@@ -209,6 +209,7 @@
 ; CHECK-EP-CGSCC-LATE-NEXT: Running pass: NoOpCGSCCPass
 ; CHECK-O-NEXT: Finished CGSCC pass manager run.
 ; CHECK-O-NEXT: Finished llvm::Module pass manager run.
+; CHECK-O-NEXT: Running pass: LowerThreadLocalIntrinsicPass
 ; CHECK-O-NEXT: Running pass: GlobalOptPass
 ; CHECK-O-NEXT: Running pass: GlobalDCEPass
 ; CHECK-DEFAULT-NEXT: Running pass: EliminateAvailableExternallyPass
Index: llvm/test/Other/new-pm-O0-defaults.ll
===
--- llvm/test/Other/new-pm-O0-defaults.ll
+++ llvm/test/Other/new-pm-O0-defaults.ll
@@ -32,6 +32,7 @@
 ; CHECK-DEFAULT-NEXT: Running analysis: ProfileSummaryAnalysis
 ; CHECK-MATRIX-NEXT: Running pass: LowerMatrixIntrinsicsPass
 ; CHECK-MATRIX-NEXT: Running analysis: TargetIRAnalysis
+; CHECK-DEFAULT-NEXT: Running pass: LowerThreadLocalIntrinsicPass
 ; CHECK-PRE-LINK-NEXT: Running pass: CanonicalizeAliasesPass
 ; CHECK-PRE-LINK-NEXT: Running pass: NameAnonGlobalPass
 ; CHECK-THINLTO-NEXT: Running pass: Annotation2MetadataPass
Index: llvm/test/Other/new-pass-manager.ll
===
--- llvm/test/Other/new-pass-manager.ll
+++ llvm/test/Other/new-pass-manager.ll
@@ -366,6 +366,7 @@
 ; CHECK-EXT-NEXT: Starting llvm::Function pass manager run.
 ; CHECK-EXT-NEXT: Running pass: {{.*}}Bye
 ; CHECK-EXT-NEXT: Finished llvm::Function pass manager run.
+; CHECK-O0-NEXT: Running pass: LowerThreadLocalIntrinsicPass
 ; CHECK-O0-NEXT: Finished llvm::Module pass manager run
 
 ; RUN: opt -disable-output -disable-verify -debug-pass-manager \
Index: llvm/lib/Transforms/Scalar/LowerThreadLocalIntrinsic.cpp
===
--- /dev/null
+++ llvm/lib/Transforms/Scalar/LowerThreadLocalIntrinsic.cpp
@@ -0,0 +1,76 @@
+//===- LowerThreadLocalIntrinsic.cpp - Lower the threadlocal intrinsic
+//---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This pass lowers the llvm.threadlocal intrinsic to a direct reference to the
+// thread local variable.
+//
+//===--===//
+
+#include 

[PATCH] D86853: [modules] Fix crash in call to `FunctionDecl::setPure()`

2020-11-18 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

@rsmith, @v.g.vassilev hey I stamped this patch assuming it looks ok. But 
definitely shout at me if more feedback needs to be addressed. Happy to follow 
up.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86853/new/

https://reviews.llvm.org/D86853

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D86853: [modules] Fix crash in call to `FunctionDecl::setPure()`

2020-11-18 Thread Xun Li via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rGc6c8d4a13ebd: [modules] Fix crash in call to 
`FunctionDecl::setPure()` (authored by andrewjcg, committed by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86853/new/

https://reviews.llvm.org/D86853

Files:
  clang/lib/Serialization/ASTReaderDecl.cpp
  clang/test/Modules/Inputs/set-pure-crash/a.h
  clang/test/Modules/Inputs/set-pure-crash/b.h
  clang/test/Modules/Inputs/set-pure-crash/c.h
  clang/test/Modules/Inputs/set-pure-crash/module.modulemap
  clang/test/Modules/set-pure-crash.cpp

Index: clang/test/Modules/set-pure-crash.cpp
===
--- /dev/null
+++ clang/test/Modules/set-pure-crash.cpp
@@ -0,0 +1,9 @@
+// RUN: rm -rf %t
+// RUN: %clang_cc1 -fsyntax-only -fmodules -fimplicit-module-maps -fmodules-cache-path=%t -x c++ -I %S/Inputs/set-pure-crash -verify %s -o %t
+
+// expected-no-diagnostics
+
+#include "b.h"
+#include "c.h"
+
+auto t = simple();
Index: clang/test/Modules/Inputs/set-pure-crash/module.modulemap
===
--- /dev/null
+++ clang/test/Modules/Inputs/set-pure-crash/module.modulemap
@@ -0,0 +1,11 @@
+module a {
+  header "a.h"
+}
+
+module b {
+  header "b.h"
+}
+
+module c {
+  header "c.h"
+}
Index: clang/test/Modules/Inputs/set-pure-crash/c.h
===
--- /dev/null
+++ clang/test/Modules/Inputs/set-pure-crash/c.h
@@ -0,0 +1,5 @@
+#pragma once
+
+template 
+struct simple {
+};
Index: clang/test/Modules/Inputs/set-pure-crash/b.h
===
--- /dev/null
+++ clang/test/Modules/Inputs/set-pure-crash/b.h
@@ -0,0 +1,14 @@
+#pragma once
+
+#include "a.h"
+#include "c.h"
+
+template >
+void foo(Fun) {}
+
+class Child : public Base {
+public:
+  void func() {
+foo([]() {});
+  }
+};
Index: clang/test/Modules/Inputs/set-pure-crash/a.h
===
--- /dev/null
+++ clang/test/Modules/Inputs/set-pure-crash/a.h
@@ -0,0 +1,11 @@
+#pragma once
+
+struct Tag {};
+
+template 
+class Base {
+public:
+  virtual void func() = 0;
+};
+
+Base bar();
Index: clang/lib/Serialization/ASTReaderDecl.cpp
===
--- clang/lib/Serialization/ASTReaderDecl.cpp
+++ clang/lib/Serialization/ASTReaderDecl.cpp
@@ -868,7 +868,10 @@
   FD->setInlineSpecified(Record.readInt());
   FD->setImplicitlyInline(Record.readInt());
   FD->setVirtualAsWritten(Record.readInt());
-  FD->setPure(Record.readInt());
+  // We defer calling `FunctionDecl::setPure()` here as for methods of
+  // `CXXTemplateSpecializationDecl`s, we may not have connected up the
+  // definition (which is required for `setPure`).
+  const bool Pure = Record.readInt();
   FD->setHasInheritedPrototype(Record.readInt());
   FD->setHasWrittenPrototype(Record.readInt());
   FD->setDeletedAsWritten(Record.readInt());
@@ -1015,6 +1018,10 @@
   }
   }
 
+  // Defer calling `setPure` until merging above has guaranteed we've set
+  // `DefinitionData` (as this will need to access it).
+  FD->setPure(Pure);
+
   // Read in the parameters.
   unsigned NumParams = Record.readInt();
   SmallVector Params;
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D90990: [Coroutine][Sema] Cleanup temporaries as early as possible

2020-11-10 Thread Xun Li via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rG19f077092343: [Coroutine][Sema] Cleanup temporaries as early 
as possible (authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90990/new/

https://reviews.llvm.org/D90990

Files:
  clang/lib/Sema/SemaCoroutine.cpp
  clang/test/AST/Inputs/std-coroutine.h
  clang/test/AST/coroutine-locals-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp

Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
@@ -0,0 +1,126 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+struct Task {
+  struct promise_type {
+Task get_return_object() noexcept {
+  return Task{coro::coroutine_handle::from_promise(*this)};
+}
+
+void return_void() noexcept {}
+
+struct final_awaiter {
+  bool await_ready() noexcept { return false; }
+  coro::coroutine_handle<> await_suspend(coro::coroutine_handle h) noexcept {
+h.destroy();
+return {};
+  }
+  void await_resume() noexcept {}
+};
+
+void unhandled_exception() noexcept {}
+
+final_awaiter final_suspend() noexcept { return {}; }
+
+coro::suspend_always initial_suspend() noexcept { return {}; }
+
+template 
+auto await_transform(Awaitable &) {
+  return awaitable.co_viaIfAsync();
+}
+  };
+
+  using handle_t = coro::coroutine_handle;
+
+  class Awaiter {
+  public:
+explicit Awaiter(handle_t coro) noexcept;
+Awaiter(Awaiter &) noexcept;
+Awaiter(const Awaiter &) = delete;
+~Awaiter();
+
+bool await_ready() noexcept { return false; }
+handle_t await_suspend(coro::coroutine_handle<> continuation) noexcept;
+void await_resume();
+
+  private:
+handle_t coro_;
+  };
+
+  Task(handle_t coro) noexcept : coro_(coro) {}
+
+  handle_t coro_;
+
+  Task(const Task ) = delete;
+  Task(Task &) noexcept;
+  ~Task();
+  Task =(Task t) noexcept;
+
+  Awaiter co_viaIfAsync();
+};
+
+static Task foo() {
+  co_return;
+}
+
+Task bar() {
+  auto mode = 2;
+  switch (mode) {
+  case 1:
+co_await foo();
+break;
+  case 2:
+co_await foo();
+break;
+  default:
+break;
+  }
+}
+
+// CHECK-LABEL: define void @_Z3barv
+// CHECK: %[[MODE:.+]] = load i32, i32* %mode
+// CHECK-NEXT:switch i32 %[[MODE]], label %{{.+}} [
+// CHECK-NEXT:  i32 1, label %[[CASE1:.+]]
+// CHECK-NEXT:  i32 2, label %[[CASE2:.+]]
+// CHECK-NEXT:]
+
+// CHECK:   [[CASE1]]:
+// CHECK: br i1 %{{.+}}, label %[[CASE1_AWAIT_READY:.+]], label %[[CASE1_AWAIT_SUSPEND:.+]]
+// CHECK:   [[CASE1_AWAIT_SUSPEND]]:
+// CHECK-NEXT:%{{.+}} = call token @llvm.coro.save(i8* null)
+// CHECK-NEXT:%[[HANDLE11:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle"* %[[TMP1:.+]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[HANDLE11]])
+
+// CHECK: %[[HANDLE12:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle"* %[[TMP1]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[HANDLE12]])
+// CHECK-NEXT:call void @llvm.coro.resume
+// CHECK-NEXT:%{{.+}} = call i8 @llvm.coro.suspend
+// CHECK-NEXT:switch i8 %{{.+}}, label %coro.ret [
+// CHECK-NEXT:  i8 0, label %[[CASE1_AWAIT_READY]]
+// CHECK-NEXT:  i8 1, label %[[CASE1_AWAIT_CLEANUP:.+]]
+// CHECK-NEXT:]
+// CHECK:   [[CASE1_AWAIT_CLEANUP]]:
+// make sure that the awaiter eventually gets cleaned up.
+// CHECK: call void @{{.+Awaiter.+}}
+
+// CHECK:   [[CASE2]]:
+// CHECK: br i1 %{{.+}}, label %[[CASE2_AWAIT_READY:.+]], label %[[CASE2_AWAIT_SUSPEND:.+]]
+// CHECK:   [[CASE2_AWAIT_SUSPEND]]:
+// CHECK-NEXT:%{{.+}} = call token @llvm.coro.save(i8* null)
+// CHECK-NEXT:%[[HANDLE21:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle"* %[[TMP2:.+]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[HANDLE21]])
+
+// CHECK: %[[HANDLE22:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle"* %[[TMP2]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[HANDLE22]])
+// CHECK-NEXT:call void @llvm.coro.resume
+// CHECK-NEXT:%{{.+}} = call i8 @llvm.coro.suspend
+// CHECK-NEXT:switch i8 %{{.+}}, label %coro.ret [
+// CHECK-NEXT:  i8 0, label %[[CASE2_AWAIT_READY]]
+// CHECK-NEXT:  i8 1, label 

[PATCH] D90990: [Coroutine][Sema] Cleanup temporaries as early as possible

2020-11-10 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 304233.
lxfind added a comment.

Add AST test


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90990/new/

https://reviews.llvm.org/D90990

Files:
  clang/lib/Sema/SemaCoroutine.cpp
  clang/test/AST/Inputs/std-coroutine.h
  clang/test/AST/coroutine-locals-cleanup.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp

Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
@@ -0,0 +1,126 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+struct Task {
+  struct promise_type {
+Task get_return_object() noexcept {
+  return Task{coro::coroutine_handle::from_promise(*this)};
+}
+
+void return_void() noexcept {}
+
+struct final_awaiter {
+  bool await_ready() noexcept { return false; }
+  coro::coroutine_handle<> await_suspend(coro::coroutine_handle h) noexcept {
+h.destroy();
+return {};
+  }
+  void await_resume() noexcept {}
+};
+
+void unhandled_exception() noexcept {}
+
+final_awaiter final_suspend() noexcept { return {}; }
+
+coro::suspend_always initial_suspend() noexcept { return {}; }
+
+template 
+auto await_transform(Awaitable &) {
+  return awaitable.co_viaIfAsync();
+}
+  };
+
+  using handle_t = coro::coroutine_handle;
+
+  class Awaiter {
+  public:
+explicit Awaiter(handle_t coro) noexcept;
+Awaiter(Awaiter &) noexcept;
+Awaiter(const Awaiter &) = delete;
+~Awaiter();
+
+bool await_ready() noexcept { return false; }
+handle_t await_suspend(coro::coroutine_handle<> continuation) noexcept;
+void await_resume();
+
+  private:
+handle_t coro_;
+  };
+
+  Task(handle_t coro) noexcept : coro_(coro) {}
+
+  handle_t coro_;
+
+  Task(const Task ) = delete;
+  Task(Task &) noexcept;
+  ~Task();
+  Task =(Task t) noexcept;
+
+  Awaiter co_viaIfAsync();
+};
+
+static Task foo() {
+  co_return;
+}
+
+Task bar() {
+  auto mode = 2;
+  switch (mode) {
+  case 1:
+co_await foo();
+break;
+  case 2:
+co_await foo();
+break;
+  default:
+break;
+  }
+}
+
+// CHECK-LABEL: define void @_Z3barv
+// CHECK: %[[MODE:.+]] = load i32, i32* %mode
+// CHECK-NEXT:switch i32 %[[MODE]], label %{{.+}} [
+// CHECK-NEXT:  i32 1, label %[[CASE1:.+]]
+// CHECK-NEXT:  i32 2, label %[[CASE2:.+]]
+// CHECK-NEXT:]
+
+// CHECK:   [[CASE1]]:
+// CHECK: br i1 %{{.+}}, label %[[CASE1_AWAIT_READY:.+]], label %[[CASE1_AWAIT_SUSPEND:.+]]
+// CHECK:   [[CASE1_AWAIT_SUSPEND]]:
+// CHECK-NEXT:%{{.+}} = call token @llvm.coro.save(i8* null)
+// CHECK-NEXT:%[[HANDLE11:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle"* %[[TMP1:.+]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[HANDLE11]])
+
+// CHECK: %[[HANDLE12:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle"* %[[TMP1]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[HANDLE12]])
+// CHECK-NEXT:call void @llvm.coro.resume
+// CHECK-NEXT:%{{.+}} = call i8 @llvm.coro.suspend
+// CHECK-NEXT:switch i8 %{{.+}}, label %coro.ret [
+// CHECK-NEXT:  i8 0, label %[[CASE1_AWAIT_READY]]
+// CHECK-NEXT:  i8 1, label %[[CASE1_AWAIT_CLEANUP:.+]]
+// CHECK-NEXT:]
+// CHECK:   [[CASE1_AWAIT_CLEANUP]]:
+// make sure that the awaiter eventually gets cleaned up.
+// CHECK: call void @{{.+Awaiter.+}}
+
+// CHECK:   [[CASE2]]:
+// CHECK: br i1 %{{.+}}, label %[[CASE2_AWAIT_READY:.+]], label %[[CASE2_AWAIT_SUSPEND:.+]]
+// CHECK:   [[CASE2_AWAIT_SUSPEND]]:
+// CHECK-NEXT:%{{.+}} = call token @llvm.coro.save(i8* null)
+// CHECK-NEXT:%[[HANDLE21:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle"* %[[TMP2:.+]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[HANDLE21]])
+
+// CHECK: %[[HANDLE22:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle"* %[[TMP2]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[HANDLE22]])
+// CHECK-NEXT:call void @llvm.coro.resume
+// CHECK-NEXT:%{{.+}} = call i8 @llvm.coro.suspend
+// CHECK-NEXT:switch i8 %{{.+}}, label %coro.ret [
+// CHECK-NEXT:  i8 0, label %[[CASE2_AWAIT_READY]]
+// CHECK-NEXT:  i8 1, label %[[CASE2_AWAIT_CLEANUP:.+]]
+// CHECK-NEXT:]
+// CHECK:   [[CASE2_AWAIT_CLEANUP]]:
+// make sure that the awaiter eventually gets cleaned up.
+// CHECK: 

[PATCH] D90990: [Coroutine][Sema] Cleanup temporaries as early as possible

2020-11-10 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/lib/Sema/SemaCoroutine.cpp:475
+  if (!AwaitSuspend)
+return Calls;
   if (!AwaitSuspend->getType()->isDependentType()) {

lxfind wrote:
> bruno wrote:
> > In case `AwaitSuspend` is null, is there any need to set `Calls.IsInvalid` 
> > as well?
> Thanks for the catch.
Oh actually this is already set in BuildSubExpr.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90990/new/

https://reviews.llvm.org/D90990

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D90990: [Coroutine][Sema] Cleanup temporaries as early as possible

2020-11-10 Thread Xun Li via Phabricator via cfe-commits
lxfind added inline comments.



Comment at: clang/lib/Sema/SemaCoroutine.cpp:475
+  if (!AwaitSuspend)
+return Calls;
   if (!AwaitSuspend->getType()->isDependentType()) {

bruno wrote:
> In case `AwaitSuspend` is null, is there any need to set `Calls.IsInvalid` as 
> well?
Thanks for the catch.



Comment at: clang/lib/Sema/SemaCoroutine.cpp:490
+  // ExprWithCleanups is wrapped within maybeTailCall() prior to the resume
+  // call.
   Calls.Results[ACT::ACT_Suspend] = TailCallSuspend;

bruno wrote:
> Is there already a test covering this tailcall case? It'd be nice to have one 
Yes both of the symmetric-transfer tests are covering this case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90990/new/

https://reviews.llvm.org/D90990

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D90990: [Coroutine][Sema] Cleanup temporaries as early as possible

2020-11-06 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
Herald added subscribers: cfe-commits, modimo, wenlei, modocache.
Herald added a project: clang.
lxfind requested review of this revision.

The original bug was discovered in T75057860. Clang front-end emits an AST that 
looks like this for an co_await expression:

| - ExprWithCleanups  |
| - -CoawaitExpr  |
| - -MaterializeTemporaryExpr ... Awaiter |
|

...
  |- -CXXMemberCallExpr ... .await_ready
...
  |- -CallExpr ... __builtin_coro_resume
...
  |- -CXXMemberCallExpr ... .await_resume
...

ExprWithCleanups is responsible for cleaning up (including calling dtors) for 
the temporaries generated in the wrapping expression).
In the above structure, the __builtin_coro_resume part (which corresponds to 
the code for the suspend case in the co_await with symmetric transfer), the 
pseudocode looks like this:

  __builtin_coro_resume(
   awaiter.await_suspend(
 from_address(
   __builtin_coro_frame())).address());

One of the temporaries that's generated as part of this code is the coroutine 
handle returned from awaiter.await_suspend() call. The call returns a handle  
which is a prvalue (since it's a returned value on the fly). In order to call 
the address() method on it, it needs to be converted into an xvalue. Hence a 
materialized temp is created to hold it. This temp will need to be cleaned up 
eventually. Now, since all cleanups happen at the end of the entire co_await 
expression, which is after the  suspension point, the compiler 
will think that such a temp needs to live across suspensions, and need to be 
put on the coroutine frame, even though it's only used temporarily just to call 
address() method.
Such a phenomena not only unnecessarily increases the frame size, but can lead 
to ASAN failures, if the coroutine was already destroyed as part of the 
await_suspend() call. This is because if the coroutine was already destroyed, 
the frame no longer exists, and one can not store anything into it. But if the 
temporary object is considered to need to live on the frame, it will be stored 
into the frame after await_suspend() returns.

A fix attempt was done in https://reviews.llvm.org/D87470. Unfortunately it is 
incorrect. The reason is that cleanups in Clang works more like linearly than 
nested. There is one current state indicating whether it needs cleanup, and an 
ExprWithCleanups resets that state. This means that an ExprWithCleanups must be 
capable of cleaning up all temporaries created  in the wrapping expression, 
otherwise there will be dangling temporaries cleaned up at the wrong place.
I eventually found a walk-around (https://reviews.llvm.org/D89066) that doesn't 
break any existing tests while fixing the issue. But it targets the final 
co_await only. If we ever have a co_await that's not on the final awaiter and 
the frame gets destroyed after suspend, we are in trouble. Hence we need a 
proper fix.

This patch is the proper fix. It does the folllowing things to fully resolve 
the issue:

1. The AST has to be generated in the order according to their nesting 
relationship. We should not generate AST out of order because then the code 
generator would incorrectly track the state of temporaries and when a cleanup 
is needed. So the code in buildCoawaitCalls is reorganized so that we will be 
generating the AST for each coawait member call in order along with their child 
AST.
2. await_ready() call is wrapped with an ExprWithCleanups so that temporaries 
in it gets cleaned up as early as possible to avoid living across suspension.
3. await_suspend() call is wrapped with an ExprWithCleanups if it's not a 
symmetric transfer. In the case of a symmetric transfer, in order to maintain 
the musttail call contract, the ExprWithCleanups is wraaped before the resume 
call.
4. In the end, we mark again that it needs a cleanup, so that the entire 
CoawaitExpr will be wrapped with a ExprWithCleanups which will clean up the 
Awaiter object associated with the await expression.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D90990

Files:
  clang/lib/Sema/SemaCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-01.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp

Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer-02.cpp
@@ -0,0 +1,126 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+struct Task {
+  struct promise_type {
+Task get_return_object() noexcept {
+  return Task{coro::coroutine_handle::from_promise(*this)};
+}
+
+void return_void() 

[PATCH] D89269: [Coroutine] Rename coro-semmetric-transfer.cpp and fix test failure

2020-10-12 Thread Xun Li via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGd80ecdf27faf: [Coroutine] Rename coro-semmetric-transfer.cpp 
and possibly fix test failure (authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89269/new/

https://reviews.llvm.org/D89269

Files:
  clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp


Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o - | 
FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 
-O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
 
 #include "Inputs/coroutine.h"
 
@@ -48,6 +48,10 @@
   co_return;
 }
 
-// check that the lifetime of the coroutine handle used to obtain the address 
ended right away.
-// CHECK:   %{{.*}} = call i8* 
@{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 nonnull %{{.*}})
-// CHECK-NEXT:  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %{{.*}})
+// check that the lifetime of the coroutine handle used to obtain the address 
is contained within single basic block.
+// CHECK-LABEL: final.suspend:
+// CHECK: %[[PTR1:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* 
%[[ADDR_TMP:.+]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])
+// CHECK: call i8* 
@{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 %[[ADDR_TMP]])
+// CHECK-NEXT:%[[PTR2:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] 
to i8*
+// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])


Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
 
 #include "Inputs/coroutine.h"
 
@@ -48,6 +48,10 @@
   co_return;
 }
 
-// check that the lifetime of the coroutine handle used to obtain the address ended right away.
-// CHECK:   %{{.*}} = call i8* @{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull %{{.*}})
-// CHECK-NEXT:  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %{{.*}})
+// check that the lifetime of the coroutine handle used to obtain the address is contained within single basic block.
+// CHECK-LABEL: final.suspend:
+// CHECK: %[[PTR1:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP:.+]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])
+// CHECK: call i8* @{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]])
+// CHECK-NEXT:%[[PTR2:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89066: [Coroutine][Sema] Only tighten the suspend call temp lifetime for final awaiter

2020-10-12 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

Test failures are being fixed in https://reviews.llvm.org/D89269.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89066/new/

https://reviews.llvm.org/D89066

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89269: [Coroutine] Rename coro-semmetric-transfer.cpp and possibly fix test failure

2020-10-12 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
lxfind added reviewers: wenlei, junparser.
Herald added subscribers: cfe-commits, modimo, modocache.
Herald added a project: clang.
lxfind requested review of this revision.

Some tests start to fail after https://reviews.llvm.org/D89066.
It's because the size of pointers are different on different targets.
Limit the target in the command so there is no confusion.
Also noticed I had typo in the test name.
Adding disable-llvm-passes option to make the test more stable as well.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D89269

Files:
  clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
  clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp


Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o - | 
FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 
-O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
 
 #include "Inputs/coroutine.h"
 
@@ -48,6 +48,10 @@
   co_return;
 }
 
-// check that the lifetime of the coroutine handle used to obtain the address 
ended right away.
-// CHECK:   %{{.*}} = call i8* 
@{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 nonnull %{{.*}})
-// CHECK-NEXT:  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %{{.*}})
+// check that the lifetime of the coroutine handle used to obtain the address 
is contained within single basic block.
+// CHECK-LABEL: final.suspend:
+// CHECK: %[[PTR1:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* 
%[[ADDR_TMP:.+]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])
+// CHECK: call i8* 
@{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 %[[ADDR_TMP]])
+// CHECK-NEXT:%[[PTR2:.+]] = bitcast 
%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] 
to i8*
+// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])


Index: clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-symmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o - | FileCheck %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines-ts -std=c++14 -O1 -emit-llvm %s -o - -disable-llvm-passes | FileCheck %s
 
 #include "Inputs/coroutine.h"
 
@@ -48,6 +48,10 @@
   co_return;
 }
 
-// check that the lifetime of the coroutine handle used to obtain the address ended right away.
-// CHECK:   %{{.*}} = call i8* @{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull %{{.*}})
-// CHECK-NEXT:  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %{{.*}})
+// check that the lifetime of the coroutine handle used to obtain the address is contained within single basic block.
+// CHECK-LABEL: final.suspend:
+// CHECK: %[[PTR1:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP:.+]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.start.p0i8(i64 8, i8* %[[PTR1]])
+// CHECK: call i8* @{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]])
+// CHECK-NEXT:%[[PTR2:.+]] = bitcast %"struct.std::experimental::coroutines_v1::coroutine_handle.0"* %[[ADDR_TMP]] to i8*
+// CHECK-NEXT:call void @llvm.lifetime.end.p0i8(i64 8, i8* %[[PTR2]])
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89066: [Coroutine][Sema] Only tighten the suspend call temp lifetime for final awaiter

2020-10-12 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

There seems to be build failures in the buildbot, but I don't understand why 
it's happening.. (unable to repro locally and the patterns seem reasonable)
http://lab.llvm.org:8011/#/builders/12/builds/92/steps/7/logs/FAIL__Clang__coro-semmetric-transfer_cpp


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89066/new/

https://reviews.llvm.org/D89066

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89066: [Coroutine][Sema] Only tighten the suspend call temp lifetime for final awaiter

2020-10-12 Thread Xun Li via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGdce8f2bb25ea: [Coroutine][Sema] Only tighten the suspend 
call temp lifetime for final awaiter (authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89066/new/

https://reviews.llvm.org/D89066

Files:
  clang/lib/Sema/SemaCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o - | 
FileCheck %s
 
 #include "Inputs/coroutine.h"
 
Index: clang/lib/Sema/SemaCoroutine.cpp
===
--- clang/lib/Sema/SemaCoroutine.cpp
+++ clang/lib/Sema/SemaCoroutine.cpp
@@ -375,7 +375,7 @@
 // returning await_suspend that results in a guaranteed tail call to the target
 // coroutine.
 static Expr *maybeTailCall(Sema , QualType RetType, Expr *E,
-   SourceLocation Loc) {
+   SourceLocation Loc, bool IsImplicit) {
   if (RetType->isReferenceType())
 return nullptr;
   Type const *T = RetType.getTypePtr();
@@ -398,10 +398,17 @@
diag::warn_coroutine_handle_address_invalid_return_type)
 << JustAddress->getType();
 
-  // The coroutine handle used to obtain the address is no longer needed
-  // at this point, clean it up to avoid unnecessarily long lifetime which
-  // could lead to unnecessary spilling.
-  JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
+  // After the await_suspend call on the awaiter, the coroutine may have
+  // been destroyed. In that case, we can not store anything to the frame
+  // from this point on. Hence here we wrap it immediately with a cleanup. This
+  // could have applied to all await_suspend calls. However doing so causes
+  // alive objects being destructed for reasons that need further
+  // investigations. Here we walk-around it temporarily by only doing it after
+  // the suspend call on the final awaiter (indicated by IsImplicit) where it's
+  // most common to happen.
+  // TODO: Properly clean up the temps generated by await_suspend calls.
+  if (IsImplicit)
+JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
   return buildBuiltinCall(S, Loc, Builtin::BI__builtin_coro_resume,
   JustAddress);
 }
@@ -409,7 +416,8 @@
 /// Build calls to await_ready, await_suspend, and await_resume for a co_await
 /// expression.
 static ReadySuspendResumeResult buildCoawaitCalls(Sema , VarDecl 
*CoroPromise,
-  SourceLocation Loc, Expr *E) 
{
+  SourceLocation Loc, Expr *E,
+  bool IsImplicit) {
   OpaqueValueExpr *Operand = new (S.Context)
   OpaqueValueExpr(Loc, E->getType(), VK_LValue, E->getObjectKind(), E);
 
@@ -458,7 +466,8 @@
 QualType RetType = AwaitSuspend->getCallReturnType(S.Context);
 
 // Experimental support for coroutine_handle returning await_suspend.
-if (Expr *TailCallSuspend = maybeTailCall(S, RetType, AwaitSuspend, Loc))
+if (Expr *TailCallSuspend =
+maybeTailCall(S, RetType, AwaitSuspend, Loc, IsImplicit))
   Calls.Results[ACT::ACT_Suspend] = TailCallSuspend;
 else {
   // non-class prvalues always have cv-unqualified types
@@ -870,8 +879,8 @@
   SourceLocation CallLoc = E->getExprLoc();
 
   // Build the await_ready, await_suspend, await_resume calls.
-  ReadySuspendResumeResult RSS =
-  buildCoawaitCalls(*this, Coroutine->CoroutinePromise, CallLoc, E);
+  ReadySuspendResumeResult RSS = buildCoawaitCalls(
+  *this, Coroutine->CoroutinePromise, CallLoc, E, IsImplicit);
   if (RSS.IsInvalid)
 return ExprError();
 
@@ -925,8 +934,8 @@
 E = CreateMaterializeTemporaryExpr(E->getType(), E, true);
 
   // Build the await_ready, await_suspend, await_resume calls.
-  ReadySuspendResumeResult RSS =
-  buildCoawaitCalls(*this, Coroutine->CoroutinePromise, Loc, E);
+  ReadySuspendResumeResult RSS = buildCoawaitCalls(
+  *this, Coroutine->CoroutinePromise, Loc, E, /*IsImplicit*/ false);
   if (RSS.IsInvalid)
 return ExprError();
 


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+// RUN: %clang -std=c++14 

[PATCH] D89066: [Coroutine][Sema] Only tighten the suspend call temp lifetime for final awaiter

2020-10-12 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 297656.
lxfind added a comment.

Add more comments and TODO


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89066/new/

https://reviews.llvm.org/D89066

Files:
  clang/lib/Sema/SemaCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o - | 
FileCheck %s
 
 #include "Inputs/coroutine.h"
 
Index: clang/lib/Sema/SemaCoroutine.cpp
===
--- clang/lib/Sema/SemaCoroutine.cpp
+++ clang/lib/Sema/SemaCoroutine.cpp
@@ -375,7 +375,7 @@
 // returning await_suspend that results in a guaranteed tail call to the target
 // coroutine.
 static Expr *maybeTailCall(Sema , QualType RetType, Expr *E,
-   SourceLocation Loc) {
+   SourceLocation Loc, bool IsImplicit) {
   if (RetType->isReferenceType())
 return nullptr;
   Type const *T = RetType.getTypePtr();
@@ -398,10 +398,17 @@
diag::warn_coroutine_handle_address_invalid_return_type)
 << JustAddress->getType();
 
-  // The coroutine handle used to obtain the address is no longer needed
-  // at this point, clean it up to avoid unnecessarily long lifetime which
-  // could lead to unnecessary spilling.
-  JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
+  // After the await_suspend call on the awaiter, the coroutine may have
+  // been destroyed. In that case, we can not store anything to the frame
+  // from this point on. Hence here we wrap it immediately with a cleanup. This
+  // could have applied to all await_suspend calls. However doing so causes
+  // alive objects being destructed for reasons that need further
+  // investigations. Here we walk-around it temporarily by only doing it after
+  // the suspend call on the final awaiter (indicated by IsImplicit) where it's
+  // most common to happen.
+  // TODO: Properly clean up the temps generated by await_suspend calls.
+  if (IsImplicit)
+JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
   return buildBuiltinCall(S, Loc, Builtin::BI__builtin_coro_resume,
   JustAddress);
 }
@@ -409,7 +416,8 @@
 /// Build calls to await_ready, await_suspend, and await_resume for a co_await
 /// expression.
 static ReadySuspendResumeResult buildCoawaitCalls(Sema , VarDecl 
*CoroPromise,
-  SourceLocation Loc, Expr *E) 
{
+  SourceLocation Loc, Expr *E,
+  bool IsImplicit) {
   OpaqueValueExpr *Operand = new (S.Context)
   OpaqueValueExpr(Loc, E->getType(), VK_LValue, E->getObjectKind(), E);
 
@@ -458,7 +466,8 @@
 QualType RetType = AwaitSuspend->getCallReturnType(S.Context);
 
 // Experimental support for coroutine_handle returning await_suspend.
-if (Expr *TailCallSuspend = maybeTailCall(S, RetType, AwaitSuspend, Loc))
+if (Expr *TailCallSuspend =
+maybeTailCall(S, RetType, AwaitSuspend, Loc, IsImplicit))
   Calls.Results[ACT::ACT_Suspend] = TailCallSuspend;
 else {
   // non-class prvalues always have cv-unqualified types
@@ -870,8 +879,8 @@
   SourceLocation CallLoc = E->getExprLoc();
 
   // Build the await_ready, await_suspend, await_resume calls.
-  ReadySuspendResumeResult RSS =
-  buildCoawaitCalls(*this, Coroutine->CoroutinePromise, CallLoc, E);
+  ReadySuspendResumeResult RSS = buildCoawaitCalls(
+  *this, Coroutine->CoroutinePromise, CallLoc, E, IsImplicit);
   if (RSS.IsInvalid)
 return ExprError();
 
@@ -925,8 +934,8 @@
 E = CreateMaterializeTemporaryExpr(E->getType(), E, true);
 
   // Build the await_ready, await_suspend, await_resume calls.
-  ReadySuspendResumeResult RSS =
-  buildCoawaitCalls(*this, Coroutine->CoroutinePromise, Loc, E);
+  ReadySuspendResumeResult RSS = buildCoawaitCalls(
+  *this, Coroutine->CoroutinePromise, Loc, E, /*IsImplicit*/ false);
   if (RSS.IsInvalid)
 return ExprError();
 


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o - | FileCheck %s
 
 #include "Inputs/coroutine.h"
 
Index: clang/lib/Sema/SemaCoroutine.cpp

[PATCH] D89066: [Coroutine][Sema] Only tighten the suspend call temp lifetime for final awaiter

2020-10-12 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D89066#2324291 , @junparser wrote:

> In D89066#2324151 , @lxfind wrote:
>
>> In D89066#2324115 , @junparser 
>> wrote:
>>
>>> why we should not do this with normal await call?
>>
>> To be honest, I don't know yet. My understanding of how expression cleanup 
>> and temp lifetime management is insufficient at the moment.
>> But first of all, without adding any cleanup expression here, I saw ASAN 
>> failures due to heap-use-after-free, because sometimes the frame have 
>> already been destroyed after the await_suspend call, and yet we are still 
>> writing into the frame due to unnecessarily cross-suspend lifetime. However, 
>> if I apply the cleanup to all await_suepend calls, it also causes ASAN 
>> failures as it's cleaning up data that's still alive.
>> So this patch is more of a temporary walkaround to stop bleeding without 
>> causing any trouble.
>> I plan to get back to this latter after I am done with the spilling/alloca 
>> issues.
>
> I'm not familiar with ASAN instrumentation. Do you have any testcases to 
> explain this?

Unfortunately I don't.  But this is not related to ASAN. Basically, this is 
causing destructing of objects that should still be alive. I suspect that it's 
because ExprWithCleanups always clean up temps that belongs to the full 
expression, not just the sub-expression in it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89066/new/

https://reviews.llvm.org/D89066

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89066: [Coroutine][Sema] Only tighten the suspend call temp lifetime for final awaiter

2020-10-11 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D89066#2324115 , @junparser wrote:

> why we should not do this with normal await call?

To be honest, I don't know yet. My understanding of how expression cleanup and 
temp lifetime management is insufficient at the moment.
But first of all, without adding any cleanup expression here, I saw ASAN 
failures due to heap-use-after-free, because sometimes the frame have already 
been destroyed after the await_suspend call, and yet we are still writing into 
the frame due to unnecessarily cross-suspend lifetime. However, if I apply the 
cleanup to all await_suepend calls, it also causes ASAN failures as it's 
cleaning up data that's still alive.
So this patch is more of a temporary walkaround to stop bleeding without 
causing any trouble.
I plan to get back to this latter after I am done with the spilling/alloca 
issues.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89066/new/

https://reviews.llvm.org/D89066

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D89066: [Coroutine][Sema] Only tighten the suspend call temp lifetime for final awaiter

2020-10-08 Thread Xun Li via Phabricator via cfe-commits
lxfind created this revision.
lxfind added reviewers: lewissbaker, wenlei, bruno, junparser, rjmccall.
Herald added subscribers: cfe-commits, modimo, dexonsmith, modocache.
Herald added a project: clang.
lxfind requested review of this revision.

In https://reviews.llvm.org/D87470 I added the change to tighten the lifetime 
of the expression awaiter.await_suspend().address.
Howver it was incorrect. ExprWithCleanups will call the dtor and end the 
lifetime for all the temps created in the current full expr.
When this is called on a normal await call, we don't want to do that.
We only want to do this for the call on the final_awaiter, to avoid writing 
into the frame after the frame is destroyed.
This change fixes it, by checking IsImplicit.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D89066

Files:
  clang/lib/Sema/SemaCoroutine.cpp
  clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o - | 
FileCheck %s
 
 #include "Inputs/coroutine.h"
 
Index: clang/lib/Sema/SemaCoroutine.cpp
===
--- clang/lib/Sema/SemaCoroutine.cpp
+++ clang/lib/Sema/SemaCoroutine.cpp
@@ -375,7 +375,7 @@
 // returning await_suspend that results in a guaranteed tail call to the target
 // coroutine.
 static Expr *maybeTailCall(Sema , QualType RetType, Expr *E,
-   SourceLocation Loc) {
+   SourceLocation Loc, bool IsImplicit) {
   if (RetType->isReferenceType())
 return nullptr;
   Type const *T = RetType.getTypePtr();
@@ -398,10 +398,12 @@
diag::warn_coroutine_handle_address_invalid_return_type)
 << JustAddress->getType();
 
-  // The coroutine handle used to obtain the address is no longer needed
-  // at this point, clean it up to avoid unnecessarily long lifetime which
-  // could lead to unnecessary spilling.
-  JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
+  // After the suspend call on the final awaiter, the coroutine may have
+  // been destroyed. In that case, we can not store anything to the frame
+  // from this point on. Hence in the case of the final awaiter suspend
+  // call (indicated by IsImplciit), we wrap it immediately with a cleanup.
+  if (IsImplicit)
+JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
   return buildBuiltinCall(S, Loc, Builtin::BI__builtin_coro_resume,
   JustAddress);
 }
@@ -409,7 +411,8 @@
 /// Build calls to await_ready, await_suspend, and await_resume for a co_await
 /// expression.
 static ReadySuspendResumeResult buildCoawaitCalls(Sema , VarDecl 
*CoroPromise,
-  SourceLocation Loc, Expr *E) 
{
+  SourceLocation Loc, Expr *E,
+  bool IsImplicit) {
   OpaqueValueExpr *Operand = new (S.Context)
   OpaqueValueExpr(Loc, E->getType(), VK_LValue, E->getObjectKind(), E);
 
@@ -458,7 +461,8 @@
 QualType RetType = AwaitSuspend->getCallReturnType(S.Context);
 
 // Experimental support for coroutine_handle returning await_suspend.
-if (Expr *TailCallSuspend = maybeTailCall(S, RetType, AwaitSuspend, Loc))
+if (Expr *TailCallSuspend =
+maybeTailCall(S, RetType, AwaitSuspend, Loc, IsImplicit))
   Calls.Results[ACT::ACT_Suspend] = TailCallSuspend;
 else {
   // non-class prvalues always have cv-unqualified types
@@ -870,8 +874,8 @@
   SourceLocation CallLoc = E->getExprLoc();
 
   // Build the await_ready, await_suspend, await_resume calls.
-  ReadySuspendResumeResult RSS =
-  buildCoawaitCalls(*this, Coroutine->CoroutinePromise, CallLoc, E);
+  ReadySuspendResumeResult RSS = buildCoawaitCalls(
+  *this, Coroutine->CoroutinePromise, CallLoc, E, IsImplicit);
   if (RSS.IsInvalid)
 return ExprError();
 
@@ -925,8 +929,8 @@
 E = CreateMaterializeTemporaryExpr(E->getType(), E, true);
 
   // Build the await_ready, await_suspend, await_resume calls.
-  ReadySuspendResumeResult RSS =
-  buildCoawaitCalls(*this, Coroutine->CoroutinePromise, Loc, E);
+  ReadySuspendResumeResult RSS = buildCoawaitCalls(
+  *this, Coroutine->CoroutinePromise, Loc, E, /*IsImplicit*/ false);
   if (RSS.IsInvalid)
 return ExprError();
 


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -1,4 +1,4 @@
-// RUN: %clang 

[PATCH] D87470: [Coroutine][Sema] Tighten the lifetime of symmetric transfer returned handle

2020-09-15 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D87470#2273310 , @junparser wrote:

> @lxfind , could you backport this to branch 11?

I am actually seeing some problems with this change. Still investigating.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87470/new/

https://reviews.llvm.org/D87470

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87470: [Coroutine][Sema] Tighten the lifetime of symmetric transfer returned handle

2020-09-11 Thread Xun Li via Phabricator via cfe-commits
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
Closed by commit rGdf477db5f9e0: [Coroutine][Sema] Tighten the lifetime of 
symmetric transfer returned handle (authored by lxfind).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87470/new/

https://reviews.llvm.org/D87470

Files:
  clang/lib/Sema/SemaCoroutine.cpp
  clang/test/CodeGenCoroutines/Inputs/coroutine.h
  clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -0,0 +1,53 @@
+// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+struct detached_task {
+  struct promise_type {
+detached_task get_return_object() noexcept {
+  return 
detached_task{coro::coroutine_handle::from_promise(*this)};
+}
+
+void return_void() noexcept {}
+
+struct final_awaiter {
+  bool await_ready() noexcept { return false; }
+  coro::coroutine_handle<> 
await_suspend(coro::coroutine_handle h) noexcept {
+h.destroy();
+return {};
+  }
+  void await_resume() noexcept {}
+};
+
+void unhandled_exception() noexcept {}
+
+final_awaiter final_suspend() noexcept { return {}; }
+
+coro::suspend_always initial_suspend() noexcept { return {}; }
+  };
+
+  ~detached_task() {
+if (coro_) {
+  coro_.destroy();
+  coro_ = {};
+}
+  }
+
+  void start() && {
+auto tmp = coro_;
+coro_ = {};
+tmp.resume();
+  }
+
+  coro::coroutine_handle coro_;
+};
+
+detached_task foo() {
+  co_return;
+}
+
+// check that the lifetime of the coroutine handle used to obtain the address 
ended right away.
+// CHECK:   %{{.*}} = call i8* 
@{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 nonnull %{{.*}})
+// CHECK-NEXT:  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %{{.*}})
Index: clang/test/CodeGenCoroutines/Inputs/coroutine.h
===
--- clang/test/CodeGenCoroutines/Inputs/coroutine.h
+++ clang/test/CodeGenCoroutines/Inputs/coroutine.h
@@ -15,7 +15,7 @@
 return me;
   }
   void operator()() { resume(); }
-  void *address() const { return ptr; }
+  void *address() const noexcept { return ptr; }
   void resume() const { __builtin_coro_resume(ptr); }
   void destroy() const { __builtin_coro_destroy(ptr); }
   bool done() const { return __builtin_coro_done(ptr); }
Index: clang/lib/Sema/SemaCoroutine.cpp
===
--- clang/lib/Sema/SemaCoroutine.cpp
+++ clang/lib/Sema/SemaCoroutine.cpp
@@ -398,6 +398,10 @@
diag::warn_coroutine_handle_address_invalid_return_type)
 << JustAddress->getType();
 
+  // The coroutine handle used to obtain the address is no longer needed
+  // at this point, clean it up to avoid unnecessarily long lifetime which
+  // could lead to unnecessary spilling.
+  JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
   return buildBuiltinCall(S, Loc, Builtin::BI__builtin_coro_resume,
   JustAddress);
 }


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -0,0 +1,53 @@
+// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+struct detached_task {
+  struct promise_type {
+detached_task get_return_object() noexcept {
+  return detached_task{coro::coroutine_handle::from_promise(*this)};
+}
+
+void return_void() noexcept {}
+
+struct final_awaiter {
+  bool await_ready() noexcept { return false; }
+  coro::coroutine_handle<> await_suspend(coro::coroutine_handle h) noexcept {
+h.destroy();
+return {};
+  }
+  void await_resume() noexcept {}
+};
+
+void unhandled_exception() noexcept {}
+
+final_awaiter final_suspend() noexcept { return {}; }
+
+coro::suspend_always initial_suspend() noexcept { return {}; }
+  };
+
+  ~detached_task() {
+if (coro_) {
+  coro_.destroy();
+  coro_ = {};
+}
+  }
+
+  void start() && {
+auto tmp = coro_;
+coro_ = {};
+tmp.resume();
+  }
+
+  coro::coroutine_handle coro_;
+};
+
+detached_task foo() {
+  co_return;
+}
+
+// check that the lifetime of the coroutine handle used to obtain the address ended right away.
+// CHECK:   %{{.*}} = call i8* @{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull %{{.*}})
+// CHECK-NEXT:  call void 

[PATCH] D87470: [Coroutine][Sema] Tighten the lifetime of symmetric transfer returned handle

2020-09-11 Thread Xun Li via Phabricator via cfe-commits
lxfind added a comment.

In D87470#2268911 , @rjmccall wrote:

> Thanks, LGTM.

Thank you for reviewing and the suggestions on testcase!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87470/new/

https://reviews.llvm.org/D87470

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D87470: [Coroutine][Sema] Tighten the lifetime of symmetric transfer returned handle

2020-09-11 Thread Xun Li via Phabricator via cfe-commits
lxfind updated this revision to Diff 291324.
lxfind added a comment.

remove asan option, not needed


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87470/new/

https://reviews.llvm.org/D87470

Files:
  clang/lib/Sema/SemaCoroutine.cpp
  clang/test/CodeGenCoroutines/Inputs/coroutine.h
  clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -0,0 +1,53 @@
+// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+struct detached_task {
+  struct promise_type {
+detached_task get_return_object() noexcept {
+  return 
detached_task{coro::coroutine_handle::from_promise(*this)};
+}
+
+void return_void() noexcept {}
+
+struct final_awaiter {
+  bool await_ready() noexcept { return false; }
+  coro::coroutine_handle<> 
await_suspend(coro::coroutine_handle h) noexcept {
+h.destroy();
+return {};
+  }
+  void await_resume() noexcept {}
+};
+
+void unhandled_exception() noexcept {}
+
+final_awaiter final_suspend() noexcept { return {}; }
+
+coro::suspend_always initial_suspend() noexcept { return {}; }
+  };
+
+  ~detached_task() {
+if (coro_) {
+  coro_.destroy();
+  coro_ = {};
+}
+  }
+
+  void start() && {
+auto tmp = coro_;
+coro_ = {};
+tmp.resume();
+  }
+
+  coro::coroutine_handle coro_;
+};
+
+detached_task foo() {
+  co_return;
+}
+
+// check that the lifetime of the coroutine handle used to obtain the address 
ended right away.
+// CHECK:   %{{.*}} = call i8* 
@{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"*
 nonnull %{{.*}})
+// CHECK-NEXT:  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %{{.*}})
Index: clang/test/CodeGenCoroutines/Inputs/coroutine.h
===
--- clang/test/CodeGenCoroutines/Inputs/coroutine.h
+++ clang/test/CodeGenCoroutines/Inputs/coroutine.h
@@ -15,7 +15,7 @@
 return me;
   }
   void operator()() { resume(); }
-  void *address() const { return ptr; }
+  void *address() const noexcept { return ptr; }
   void resume() const { __builtin_coro_resume(ptr); }
   void destroy() const { __builtin_coro_destroy(ptr); }
   bool done() const { return __builtin_coro_done(ptr); }
Index: clang/lib/Sema/SemaCoroutine.cpp
===
--- clang/lib/Sema/SemaCoroutine.cpp
+++ clang/lib/Sema/SemaCoroutine.cpp
@@ -398,6 +398,10 @@
diag::warn_coroutine_handle_address_invalid_return_type)
 << JustAddress->getType();
 
+  // The coroutine handle used to obtain the address is no longer needed
+  // at this point, clean it up to avoid unnecessarily long lifetime which
+  // could lead to unnecessary spilling.
+  JustAddress = S.MaybeCreateExprWithCleanups(JustAddress);
   return buildBuiltinCall(S, Loc, Builtin::BI__builtin_coro_resume,
   JustAddress);
 }


Index: clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
===
--- /dev/null
+++ clang/test/CodeGenCoroutines/coro-semmetric-transfer.cpp
@@ -0,0 +1,53 @@
+// RUN: %clang -std=c++14 -fcoroutines-ts -emit-llvm -S -O1 %s -o -
+
+#include "Inputs/coroutine.h"
+
+namespace coro = std::experimental::coroutines_v1;
+
+struct detached_task {
+  struct promise_type {
+detached_task get_return_object() noexcept {
+  return detached_task{coro::coroutine_handle::from_promise(*this)};
+}
+
+void return_void() noexcept {}
+
+struct final_awaiter {
+  bool await_ready() noexcept { return false; }
+  coro::coroutine_handle<> await_suspend(coro::coroutine_handle h) noexcept {
+h.destroy();
+return {};
+  }
+  void await_resume() noexcept {}
+};
+
+void unhandled_exception() noexcept {}
+
+final_awaiter final_suspend() noexcept { return {}; }
+
+coro::suspend_always initial_suspend() noexcept { return {}; }
+  };
+
+  ~detached_task() {
+if (coro_) {
+  coro_.destroy();
+  coro_ = {};
+}
+  }
+
+  void start() && {
+auto tmp = coro_;
+coro_ = {};
+tmp.resume();
+  }
+
+  coro::coroutine_handle coro_;
+};
+
+detached_task foo() {
+  co_return;
+}
+
+// check that the lifetime of the coroutine handle used to obtain the address ended right away.
+// CHECK:   %{{.*}} = call i8* @{{.*address.*}}(%"struct.std::experimental::coroutines_v1::coroutine_handle.0"* nonnull %{{.*}})
+// CHECK-NEXT:  call void @llvm.lifetime.end.p0i8(i64 8, i8* nonnull %{{.*}})
Index: clang/test/CodeGenCoroutines/Inputs/coroutine.h

  1   2   >