Re: [PATCH] D18328: [CUDA] Add option to mark most functions inside as host+device.

2016-03-21 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Here are two other approaches we considered and rejected, for the record:

1. Copy-paste a  implementation from e.g. libc++ into 
__clang_cuda_runtime_wrapper.h, and edit it appropriately.  Then #define the 
real 's include guards.

  Main problem with this is the obvious one: We're copying a big chunk of the 
standard library into the compiler, where it doesn't belong, and now we have 
two divergent copies of this code to maintain.  In addition, we can't 
necessarily use libc++, since we need to support pre-c++11 and AIUI libc++ does 
not.



2. Provide `__device__` overrides for all the functions defined in .  
This almost works, except that we do not (currently) have a way to let you 
inject new overloads for member functions into classes we don't own.  E.g. we 
can add a `__device__` overload `std::real(const complex&)`, just like we 
could override `std::real` in any other way, but we can't add a new 
`__device__` overload to `std::complex::real()`.

  This approach also has a similar problem to (1), which is that we'd end up 
copy/pasting almost all of  into the compiler.



Comment at: include/clang/Driver/Options.td:383-384
@@ -382,2 +382,4 @@
   HelpText<"Enable device-side debug info generation. Disables ptxas 
optimizations.">;
+def cuda_allow_std_complex : Flag<["--"], "cuda-allow-std-complex">,
+  HelpText<"Allow CUDA device code to use definitions from , other 
than operator>> and operator<<.">;
 def cuda_path_EQ : Joined<["--"], "cuda-path=">, Group,

tra wrote:
> rsmith wrote:
> > I don't think it's reasonable to have something this hacky / arbitrary in 
> > the stable Clang driver interface.
> What would be a better way to enable this 'feature'? I guess we could live 
> with -Xclang -fcuda-allow-std-complex for now, but that does not seem to be 
> particularly good way to give user control, either.
> 
> Perhaps we should have some sort of --cuda-enable-extension=foo option to 
> control CUDA hacks.
> I don't think it's reasonable to have something this hacky / arbitrary in the 
> stable Clang driver interface.

This is an important feature for a lot of projects, including tensorflow and 
eigen.  No matter how we define the flag, I suspect people are going to use it 
en masse.  (Most projects I've seen pass the equivalent flag to nvcc.)  At the 
point that many or even most projects are relying on it, I'd suspect we'll have 
difficulty changing this flag, regardless of whether or not it is officially 
part of our stable API.

There's also the issue of discoverability.  nvcc actually gives a nice error 
message when you try to use std::complex -- it seems pretty unfriendly not to 
even list the relevant flag in clang --help.

I don't feel particularly strongly about this, though -- I'm more concerned 
about getting something that works.


http://reviews.llvm.org/D18328



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D18328: [CUDA] Add option to mark most functions inside as host+device.

2016-03-21 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Thanks for the suggestions, Richard.  I'm not sure any of them will work, but I 
don't defend this patch as anything other than a hack, so if we can come up 
with something that works for what we need to accomplish and is cleaner, that's 
great.

In http://reviews.llvm.org/D18328#379824, @rsmith wrote:

> I would much prefer for us to, say, provide a  header that wraps the 
> system one and does something like
>
>   // 
>   #pragma clang cuda_implicit_host_device {
>   #include_next 
>   #pragma clang cuda_implicit_host_device }


We considered this and ruled it out for two reasons:

1. We'd have to exclude operator>> and operator<<, presumably with additional 
pragmas, and
2. We'd have to exclude everything included by .

Of course with enough pragmas anything is possible, but at this point it seemed 
to become substantially more complicated than this (admittedly awful) hack.

> or to provide an explicit list of the functions that we're promoting to 
> `__host__` `__device__`


The problem with that is that libstdc++ uses many helper functions, which we'd 
also have to enumerate.  Baking those kinds of implementation details into 
clang seemed worse than this hack.

> or to require people to use a CUDA-compatible standard library if they want 
> CUDA-compatible standard library behaviour.


I think asking people to use a custom standard library is a nonstarter for e.g. 
OSS tensorflow, and I suspect it would be a considerable amount of work to 
accomplish in google3.  (Not to suggest that two wrongs make a right, but we 
already have many similar hacks in place to match nvcc's behavior with standard 
library functions -- the main difference here is that we're spelling the hack 
in clang's C++ as opposed to in __clang_cuda_runtime_wrapper.h.)


http://reviews.llvm.org/D18328



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D18328: [CUDA] Add option to mark most functions inside as host+device.

2016-03-21 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/Sema/SemaCUDA.cpp:474
@@ +473,3 @@
+  SourceLocation Loc = FD.getLocation();
+  if (!SM.isInSystemHeader(Loc))
+return false;

tra wrote:
> Can C++ library headers ever be non-system? I.e. can someone use libc++ via 
> -I ?
> 
Good question, I have no idea if that's supposed to work.  Reid, do you know?


http://reviews.llvm.org/D18328



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D18328: [CUDA] Add option to mark most functions inside as host+device.

2016-03-21 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added reviewers: tra, rnk.
jlebar added subscribers: cfe-commits, jhen.

clang --cuda-allow-std-complex translates into cc1
-fcuda-allow-std-complex.  With this flag, we will mark all functions
inside  within namespace std as host+device, other than
operator>> and operator<<, which use ostreams, which are not supported
in CUDA device code.

http://reviews.llvm.org/D18328

Files:
  include/clang/Basic/LangOptions.def
  include/clang/Driver/CC1Options.td
  include/clang/Driver/Options.td
  include/clang/Sema/Sema.h
  lib/Driver/Tools.cpp
  lib/Frontend/CompilerInvocation.cpp
  lib/Sema/SemaCUDA.cpp
  lib/Sema/SemaDecl.cpp
  test/Driver/cuda-complex.cu
  test/SemaCUDA/Inputs/complex
  test/SemaCUDA/complex.cu

Index: test/SemaCUDA/complex.cu
===
--- /dev/null
+++ test/SemaCUDA/complex.cu
@@ -0,0 +1,27 @@
+// RUN: %clang_cc1 -triple nvptx-unknown-cuda -fsyntax-only -fcuda-allow-std-complex -fcuda-is-device -isystem "%S/Inputs" -verify %s
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -fcuda-allow-std-complex -isystem "%S/Inputs" -verify %s
+
+// Checks that functions inside a system header named  are marked as
+// host+device.
+
+#include 
+#include 
+
+using std::complex;
+using std::real;
+
+void __device__ foo() {
+  complex x;
+  complex y(x);
+  y += x;
+  x + y;
+  real(complex(1, 2));
+
+  // Our  header defines complex-to-complex operator<< and operator>>,
+  // but these are not implicitly marked as host+device.
+
+  x << y; // expected-error {{invalid operands to binary expression}}
+  // expected-note@complex:* {{call to __host__ function from __device__ function}}
+  x >> y; // expected-error {{invalid operands to binary expression}}
+  // expected-note@complex:* {{call to __host__ function from __device__ function}}
+}
Index: test/SemaCUDA/Inputs/complex
===
--- /dev/null
+++ test/SemaCUDA/Inputs/complex
@@ -0,0 +1,30 @@
+// Incomplete stub of  used to check that we properly annotate these
+// functions as host+device.
+
+namespace std {
+
+template 
+class complex {
+ public:
+  complex(const T  = T(), const T  = T());
+  complex +=(const complex &);
+
+ private:
+  T real;
+  T imag;
+};
+
+template 
+complex operator+(const complex &, const complex &);
+
+template 
+T real(const complex &);
+
+// Stream operators are not marked as host+device.
+template 
+void operator<<(const complex &, const complex &);
+
+template 
+void operator>>(const complex &, const complex &);
+
+} // namespace std
Index: test/Driver/cuda-complex.cu
===
--- /dev/null
+++ test/Driver/cuda-complex.cu
@@ -0,0 +1,15 @@
+// Tests CUDA compilation pipeline construction in Driver.
+// REQUIRES: clang-driver
+
+// Check that --cuda-allow-std-complex passes -fcuda-allow-std-complex to cc1.
+// RUN: %clang -### -target x86_64-linux-gnu --cuda-allow-std-complex -c %s 2>&1 \
+// RUN: | FileCheck -check-prefix ALLOW-COMPLEX %s
+
+// ALLOW-COMPLEX: -fcuda-allow-std-complex
+
+// But if we don't pass --cuda-allow-std-complex, we don't pass
+// -fcuda-allow-std-complex to cc1.
+// RUN: %clang -### -target x86_64-linux-gnu -c %s 2>&1 \
+// RUN: | FileCheck -check-prefix NO-ALLOW-COMPLEX %s
+
+// NO-ALLOW-COMPLEX-NOT: -fcuda-allow-std-complex
Index: lib/Sema/SemaDecl.cpp
===
--- lib/Sema/SemaDecl.cpp
+++ lib/Sema/SemaDecl.cpp
@@ -8340,6 +8340,12 @@
 isExplicitSpecialization || isFunctionTemplateSpecialization);
   }
 
+  // CUDA: Some decls in system headers get an implicit __host__ __device__.
+  if (getLangOpts().CUDA && declShouldBeCUDAHostDevice(*NewFD)) {
+NewFD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
+NewFD->addAttr(CUDAHostAttr::CreateImplicit(Context));
+  }
+
   if (getLangOpts().CPlusPlus) {
 if (FunctionTemplate) {
   if (NewFD->isInvalidDecl())
Index: lib/Sema/SemaCUDA.cpp
===
--- lib/Sema/SemaCUDA.cpp
+++ lib/Sema/SemaCUDA.cpp
@@ -14,11 +14,13 @@
 #include "clang/Sema/Sema.h"
 #include "clang/AST/ASTContext.h"
 #include "clang/AST/Decl.h"
+#include "clang/AST/DeclTemplate.h"
 #include "clang/AST/ExprCXX.h"
 #include "clang/Lex/Preprocessor.h"
 #include "clang/Sema/SemaDiagnostic.h"
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringSet.h"
 using namespace clang;
 
 ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation oc,
@@ -450,3 +452,44 @@
 
   return true;
 }
+
+// Everything within namespace std inside  should be host+device,
+// except operator<< and operator>> (ostreams aren't supported in CUDA device
+// code).  Whitelisting the functions we want, rather than blacklisting the
+// stream operators, is a tempting alternative, but libstdc++ uses many helper

[PATCH] D18327: [sema] [CUDA] Use std algorithms in EraseUnwantedCUDAMatchesImpl.

2016-03-21 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: tra.
jlebar added a subscriber: cfe-commits.

NFC

http://reviews.llvm.org/D18327

Files:
  lib/Sema/SemaCUDA.cpp

Index: lib/Sema/SemaCUDA.cpp
===
--- lib/Sema/SemaCUDA.cpp
+++ lib/Sema/SemaCUDA.cpp
@@ -210,31 +210,28 @@
   return false;
 }
 
-template 
-static void EraseUnwantedCUDAMatchesImpl(Sema , const FunctionDecl *Caller,
- llvm::SmallVectorImpl ,
- FetchDeclFn FetchDecl) {
+template 
+static void EraseUnwantedCUDAMatchesImpl(
+Sema , const FunctionDecl *Caller, llvm::SmallVectorImpl ,
+std::function FetchDecl) {
   assert(S.getLangOpts().CUDATargetOverloads &&
  "Should not be called w/o enabled target overloads.");
   if (Matches.size() <= 1)
 return;
 
+  // Gets the CUDA function preference for a call from Caller to Match.
+  auto GetCFP = [&](const T ) {
+return S.IdentifyCUDAPreference(Caller, FetchDecl(Match));
+  };
+
   // Find the best call preference among the functions in Matches.
-  Sema::CUDAFunctionPreference P, BestCFP = Sema::CFP_Never;
-  for (auto const  : Matches) {
-P = S.IdentifyCUDAPreference(Caller, FetchDecl(Match));
-if (P > BestCFP)
-  BestCFP = P;
-  }
+  Sema::CUDAFunctionPreference BestCFP = GetCFP(*std::max_element(
+  Matches.begin(), Matches.end(),
+  [&](const T , const T ) { return GetCFP(M1) < GetCFP(M2); }));
 
   // Erase all functions with lower priority.
-  for (unsigned I = 0, N = Matches.size(); I != N;)
-if (S.IdentifyCUDAPreference(Caller, FetchDecl(Matches[I])) < BestCFP) {
-  Matches[I] = Matches[--N];
-  Matches.resize(N);
-} else {
-  ++I;
-}
+  Matches.erase(llvm::remove_if(
+  Matches, [&](const T ) { return GetCFP(Match) < BestCFP; }));
 }
 
 void Sema::EraseUnwantedCUDAMatches(const FunctionDecl *Caller,


Index: lib/Sema/SemaCUDA.cpp
===
--- lib/Sema/SemaCUDA.cpp
+++ lib/Sema/SemaCUDA.cpp
@@ -210,31 +210,28 @@
   return false;
 }
 
-template 
-static void EraseUnwantedCUDAMatchesImpl(Sema , const FunctionDecl *Caller,
- llvm::SmallVectorImpl ,
- FetchDeclFn FetchDecl) {
+template 
+static void EraseUnwantedCUDAMatchesImpl(
+Sema , const FunctionDecl *Caller, llvm::SmallVectorImpl ,
+std::function FetchDecl) {
   assert(S.getLangOpts().CUDATargetOverloads &&
  "Should not be called w/o enabled target overloads.");
   if (Matches.size() <= 1)
 return;
 
+  // Gets the CUDA function preference for a call from Caller to Match.
+  auto GetCFP = [&](const T ) {
+return S.IdentifyCUDAPreference(Caller, FetchDecl(Match));
+  };
+
   // Find the best call preference among the functions in Matches.
-  Sema::CUDAFunctionPreference P, BestCFP = Sema::CFP_Never;
-  for (auto const  : Matches) {
-P = S.IdentifyCUDAPreference(Caller, FetchDecl(Match));
-if (P > BestCFP)
-  BestCFP = P;
-  }
+  Sema::CUDAFunctionPreference BestCFP = GetCFP(*std::max_element(
+  Matches.begin(), Matches.end(),
+  [&](const T , const T ) { return GetCFP(M1) < GetCFP(M2); }));
 
   // Erase all functions with lower priority.
-  for (unsigned I = 0, N = Matches.size(); I != N;)
-if (S.IdentifyCUDAPreference(Caller, FetchDecl(Matches[I])) < BestCFP) {
-  Matches[I] = Matches[--N];
-  Matches.resize(N);
-} else {
-  ++I;
-}
+  Matches.erase(llvm::remove_if(
+  Matches, [&](const T ) { return GetCFP(Match) < BestCFP; }));
 }
 
 void Sema::EraseUnwantedCUDAMatches(const FunctionDecl *Caller,
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D18219: Add -cuda-relaxed-constexpr, which lets CUDA device code call constexpr functions.

2016-03-19 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Abandoning this for now -- it is a maze of twisty passages all alike.  Some std 
math functions are constexpr, and making them host+device affects all our 
existing math overloads in fun and exciting ways.

We may need this at some point, but for now I'm going to try something more 
finely targeted specifically at std::complex.


http://reviews.llvm.org/D18219



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D18219: Add -cuda-relaxed-constexpr, which lets CUDA device code call constexpr functions.

2016-03-19 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Actually, this isn't enough.  It works fine for std::complex, but it screws 
with our existing std math business, because some of the host functions that 
we're trying to override are constexpr, and now they're treated as constexpr 
host-device.


http://reviews.llvm.org/D18219



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D18051: [CUDA] Provide CUDA's vector types implemented using clang's vector extension.

2016-03-10 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:72
@@ -71,1 +71,3 @@
 
+#if defined(CUDA_VECTOR_TYPES)
+// Prevent inclusion of CUDA's vector_types.h

The compiler driver is responsible for enabling/disabling language extensions, 
and for choosing exactly which dialect we accept.  It's also responsible for 
deciding which optimizations to use.  This fits in all of those ways.

Moreover, again, -Dfoo won't appear in --help, so, from a user's perspective, 
is undiscoverable.  In the event that they do discover it somehow, there's no 
documentation attached to the flag.

I am not aware of any switches built into clang that rely on -D.  If you really 
want to do it this way, can you point me to prior art?


Comment at: lib/Headers/__clang_cuda_vector_types.h:81
@@ +80,3 @@
+  : x(__x), y(__y), z(__z) {}
+  __attribute__((host, device)) explicit dim3(uint3 __a)
+  : x(__a.x), y(__a.y), z(__a.z) {}

Huh, apparently we do want to use the reserved namespace?

If so, this logic applies very strongly to a -D, which is going to be far more 
user-visible than the arg names here.


Comment at: lib/Headers/__clang_cuda_vector_types.h:83
@@ +82,3 @@
+  : x(__a.x), y(__a.y), z(__a.z) {}
+  __attribute__((host, device)) operator uint3(void) { return {x, y, z}; }
+};

If I'm understanding correctly, you're saying that if we have

  struct dim3 {
dim3(unsigned, unsigned, unsigned);
dim3(uint3);
  };

  void foo(dim3);

that the call

  uint3 x;
  foo(x);

is ambiguous, because it could call either dim3 constructor overload?

That is bizarre, but if so, do we need the dim3(uint3) constructor at all?


http://reviews.llvm.org/D18051



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D18051: [CUDA] Provide CUDA's vector types implemented using clang's vector extension.

2016-03-10 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:72
@@ -71,1 +71,3 @@
 
+#if defined(CUDA_VECTOR_TYPES)
+// Prevent inclusion of CUDA's vector_types.h

Hm, this is a surprising (to me) way of controlling this feature.  Can we use a 
-f flag instead?  Even if all that -f flag does is define something (although 
in this case I'd suggest giving it a longer name so it's harder to collide with 
it).

-fsomething would be more discoverable and canonical, I think, and would be 
easier to document.


Comment at: lib/Headers/__clang_cuda_vector_types.h:76
@@ +75,3 @@
+
+__attribute__((host,device))
+struct dim3 {

I thought host/device attributes weren't needed on classes, only functions?


Comment at: lib/Headers/__clang_cuda_vector_types.h:80
@@ +79,3 @@
+  __attribute__((host, device))
+  dim3(unsigned __x = 1, unsigned __y = 1, unsigned __z = 1)
+  : x(__x), y(__y), z(__z) {}

Nit: double underscore is a little weird here, and sort of needlessly competes 
with the language-reserved __ identifier namespace.  Could we just use one 
underscore?


Comment at: lib/Headers/__clang_cuda_vector_types.h:82
@@ +81,3 @@
+  : x(__x), y(__y), z(__z) {}
+  __attribute__((host, device)) explicit dim3(uint3 __a)
+  : x(__a.x), y(__a.y), z(__a.z) {}

nvidia's version of this function is not explicit -- is this difference 
intentional?


Comment at: lib/Headers/__clang_cuda_vector_types.h:84
@@ +83,3 @@
+  : x(__a.x), y(__a.y), z(__a.z) {}
+  __attribute__((host, device)) operator uint3(void) { return {x, y, z}; }
+};

This requires C++11 -- is that intentional?


http://reviews.llvm.org/D18051



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17779: [CUDA] Emit host-side 'shadows' for device-side global variables

2016-03-01 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/CodeGen/CGCUDANV.cpp:168
@@ -163,1 +167,3 @@
+/// of global scope device-side variables generated in this module
+/// with the CUDA runtime.
 /// \code

This is kind of hard to parse.  How about rephrasing to something like:

Creates a function that sets up state on the host side for CUDA objects that 
have a presence on both the host and device sides.  Specifically, registers the 
host side of kernel functions and __device__ global variables with the CUDA 
runtime.


Comment at: lib/CodeGen/CGCUDANV.cpp:213
@@ +212,3 @@
+  // void __cudaRegisterVar(void **, char *, char *, const char *,
+  //int, int, int, int)
+  std::vector RegisterVarParams = {

Can we say what these args mean?


Comment at: lib/CodeGen/CGCUDANV.cpp:224
@@ +223,3 @@
+llvm::Constant *VarName = makeConstantString(Var->getName());
+llvm::Value *args[] = {
+, Builder.CreateBitCast(Var, VoidPtrTy), VarName,

Nit: s/args/Args/?


Comment at: lib/CodeGen/CGCUDANV.cpp:228
@@ +227,3 @@
+llvm::ConstantInt::get(IntTy, CGM.getDataLayout().getTypeAllocSize(
+  Var->getValueType())), // sizeof(var)
+llvm::ConstantInt::get(IntTy, (Flags & DevVarConst) ? 1 : 0),

Nit: Maybe pull this expression out as a separate var?  Then the comment isn't 
needed (would be nice, because at the moment it's ambiguous exactly what 
"sizeof(var)" refers to.


Comment at: lib/CodeGen/CodeGenModule.cpp:1532
@@ +1531,3 @@
+  // We need to emit host-side 'shadows' for all global
+  // device-side variables because CUDA runtime API needs their
+  // size and host-side address in order to provide access to

s/CUDA runtime API/the CUDA runtime/ (not really a requirement of the API, I 
think?)


Comment at: lib/CodeGen/CodeGenModule.cpp:1575
@@ +1574,3 @@
+  // definition, because we still need to define host-side shadow
+  // for it.
+} else if (VD->isThisDeclarationADefinition() != VarDecl::Definition &&

Kind of an odd way of writing this control flow?  Could we phrase it more 
idiomatically as

  MustEmitForCUDA = !VD->hasDefinition() && ...;
  if (!MustEmitForCUDA && ...) return;


Comment at: lib/CodeGen/CodeGenModule.cpp:2477
@@ +2476,3 @@
+  if (D->hasAttr() || D->hasAttr()) {
+Linkage = llvm::GlobalValue::InternalLinkage;
+

Is it worth explaining why the shadows get internal linkage?


Comment at: lib/CodeGen/CodeGenModule.cpp:2480
@@ +2479,3 @@
+// Shadow variables and their properties must be registered
+// with CUDA runtime.
+unsigned Flags = 0;

with the CUDA runtime


Comment at: lib/CodeGen/CodeGenModule.cpp:2483
@@ +2482,3 @@
+if (!D->hasDefinition())
+  Flags |= CGCUDARuntime::DevVarExt;
+if (D->hasAttr())

Now that I see them in context, I think these flags would be a lot easier to 
handle if they employed less abbreviation.  "ExternalDeviceVar", 
"ConstDeviceVar"?


Comment at: test/CodeGenCUDA/device-stub.cu:14
@@ +13,3 @@
+
+// Make sure host globals don't get internalized..
+// CHECK-DAG: @host_var = global i32

Not sure if this is a typo or if you mean "...".


Comment at: test/CodeGenCUDA/device-stub.cu:17
@@ +16,3 @@
+int host_var;
+// .. and that extern vars remain external.
+// CHECK-DAG: @ext_host_var = external global i32

Here you do seem to mean "..."


http://reviews.llvm.org/D17779



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: r260760 - [libclang] Separate the underlying indexing functionality of libclang and introduce it into the clangIndex library.

2016-02-29 Thread Justin Lebar via cfe-commits
If the test shouldn't require large stack usage, should the not_ubsan,
not_asan tags be removed?

On Mon, Feb 29, 2016 at 9:55 PM, Justin Lebar  wrote:
> Works!  Thank you for fixing this, and so quickly.
>
> On Mon, Feb 29, 2016 at 6:51 PM, Argyrios Kyrtzidis  wrote:
>> Try with r262290.
>>
>>> On Feb 29, 2016, at 6:23 PM, Argyrios Kyrtzidis  wrote:
>>>
>>> Ah, I see the problem in the code, will fix shortly.
>>>
 On Feb 29, 2016, at 5:43 PM, Justin Lebar  wrote:

> How are you configuring

 cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang
 -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_ASSERTIONS=On
 -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_MAN=true ../llvm

> and what compiler and version are you using ?

 $ clang --version
 Ubuntu clang version 3.5.0-4ubuntu2~trusty2 (tags/RELEASE_350/final)
 (based on LLVM 3.5.0)
 Target: x86_64-pc-linux-gnu
 Thread model: posix

 I do not see the stack overflow or segfault in debug builds.  I do see
 it regardless of whether or not I enable assertions.  I also do not
 see the stack overflow in a release build if I reduce the amount of
 nesting in the test to about 2/3 of its present value.

 On Mon, Feb 29, 2016 at 5:38 PM, Argyrios Kyrtzidis  
 wrote:
> I don’t quite understand how it gets that stack trace, dataTraverseNode() 
> was introduced to avoid exactly this.
> How are you configuring and what compiler and version are you using ?
>
>> On Feb 29, 2016, at 3:14 PM, Justin Lebar  wrote:
>>
>>> Is this still an issue after r260785 ?
>>
>> I just sync'ed to r262268 and was able to reproduce the segfault.
>>
>>> Could you provide a stack trace ?
>>
>> $ gdb --args release/bin/c-index-test -index-file
>> /usr/local/google/home/jlebar/code/llvm/src/tools/clang/test/Index/index-many-call-ops.cpp
>> (gdb) run
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x72ca2700 (LWP 3936)]
>> 0x74f264be in
>> clang::DeclarationName::print(llvm::raw_ostream&,
>> clang::PrintingPolicy const&) ()
>> from 
>> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
>> (gdb) bt
>> #0  0x74f264be in
>> clang::DeclarationName::print(llvm::raw_ostream&,
>> clang::PrintingPolicy const&) ()
>> from 
>> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
>> #1  0x74f26c9e in clang::operator<<(llvm::raw_ostream&,
>> clang::DeclarationName) ()
>> from 
>> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
>> #2  0x74ebcb91 in
>> clang::cxindex::CXIndexDataConsumer::getEntityInfo(clang::NamedDecl
>> const*, clang::cxindex::EntityInfo&, clang::cxindex::ScratchAlloc&) ()
>> from 
>> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
>> #3  0x74ebc206 in
>> clang::cxindex::CXIndexDataConsumer::handleReference(clang::NamedDecl
>> const*, clang::SourceLocation, CXCursor, clang::NamedDecl const*,
>> clang::DeclContext const*, clang::Expr const*, CXIdxEntityRefKind) ()
>> from 
>> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
>> #4  0x74ebbae2 in
>> clang::cxindex::CXIndexDataConsumer::handleDeclOccurence(clang::Decl
>> const*, unsigned int, llvm::ArrayRef,
>> clang::FileID, unsigned int,
>> clang::index::IndexDataConsumer::ASTNodeInfo) ()
>> from 
>> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
>> #5  0x751a05c5 in
>> clang::index::IndexingContext::handleDeclOccurrence(clang::Decl
>> const*, clang::SourceLocation, bool, clang::Decl const*, unsigned int,
>> llvm::ArrayRef, clang::Expr const*,
>> clang::Decl const*, clang::DeclContext const*) ()
>> from 
>> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
>> #6  0x751a092a in
>> clang::index::IndexingContext::handleReference(clang::NamedDecl
>> const*, clang::SourceLocation, clang::NamedDecl const*,
>> clang::DeclContext const*, unsigned int,
>> llvm::ArrayRef, clang::Expr const*,
>> clang::Decl const*) ()
>> from 
>> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
>> #7  0x751a9256 in clang::RecursiveASTVisitor<(anonymous
>> namespace)::BodyIndexer>::dataTraverseNode(clang::Stmt*,
>> llvm::SmallVectorImpl> llvm::PointerLikeTypeTraits,
>> llvm::PointerIntPairInfo> 

Re: r260760 - [libclang] Separate the underlying indexing functionality of libclang and introduce it into the clangIndex library.

2016-02-29 Thread Justin Lebar via cfe-commits
Works!  Thank you for fixing this, and so quickly.

On Mon, Feb 29, 2016 at 6:51 PM, Argyrios Kyrtzidis  wrote:
> Try with r262290.
>
>> On Feb 29, 2016, at 6:23 PM, Argyrios Kyrtzidis  wrote:
>>
>> Ah, I see the problem in the code, will fix shortly.
>>
>>> On Feb 29, 2016, at 5:43 PM, Justin Lebar  wrote:
>>>
 How are you configuring
>>>
>>> cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=clang
>>> -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_ASSERTIONS=On
>>> -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_MAN=true ../llvm
>>>
 and what compiler and version are you using ?
>>>
>>> $ clang --version
>>> Ubuntu clang version 3.5.0-4ubuntu2~trusty2 (tags/RELEASE_350/final)
>>> (based on LLVM 3.5.0)
>>> Target: x86_64-pc-linux-gnu
>>> Thread model: posix
>>>
>>> I do not see the stack overflow or segfault in debug builds.  I do see
>>> it regardless of whether or not I enable assertions.  I also do not
>>> see the stack overflow in a release build if I reduce the amount of
>>> nesting in the test to about 2/3 of its present value.
>>>
>>> On Mon, Feb 29, 2016 at 5:38 PM, Argyrios Kyrtzidis  
>>> wrote:
 I don’t quite understand how it gets that stack trace, dataTraverseNode() 
 was introduced to avoid exactly this.
 How are you configuring and what compiler and version are you using ?

> On Feb 29, 2016, at 3:14 PM, Justin Lebar  wrote:
>
>> Is this still an issue after r260785 ?
>
> I just sync'ed to r262268 and was able to reproduce the segfault.
>
>> Could you provide a stack trace ?
>
> $ gdb --args release/bin/c-index-test -index-file
> /usr/local/google/home/jlebar/code/llvm/src/tools/clang/test/Index/index-many-call-ops.cpp
> (gdb) run
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x72ca2700 (LWP 3936)]
> 0x74f264be in
> clang::DeclarationName::print(llvm::raw_ostream&,
> clang::PrintingPolicy const&) ()
> from 
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> (gdb) bt
> #0  0x74f264be in
> clang::DeclarationName::print(llvm::raw_ostream&,
> clang::PrintingPolicy const&) ()
> from 
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> #1  0x74f26c9e in clang::operator<<(llvm::raw_ostream&,
> clang::DeclarationName) ()
> from 
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> #2  0x74ebcb91 in
> clang::cxindex::CXIndexDataConsumer::getEntityInfo(clang::NamedDecl
> const*, clang::cxindex::EntityInfo&, clang::cxindex::ScratchAlloc&) ()
> from 
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> #3  0x74ebc206 in
> clang::cxindex::CXIndexDataConsumer::handleReference(clang::NamedDecl
> const*, clang::SourceLocation, CXCursor, clang::NamedDecl const*,
> clang::DeclContext const*, clang::Expr const*, CXIdxEntityRefKind) ()
> from 
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> #4  0x74ebbae2 in
> clang::cxindex::CXIndexDataConsumer::handleDeclOccurence(clang::Decl
> const*, unsigned int, llvm::ArrayRef,
> clang::FileID, unsigned int,
> clang::index::IndexDataConsumer::ASTNodeInfo) ()
> from 
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> #5  0x751a05c5 in
> clang::index::IndexingContext::handleDeclOccurrence(clang::Decl
> const*, clang::SourceLocation, bool, clang::Decl const*, unsigned int,
> llvm::ArrayRef, clang::Expr const*,
> clang::Decl const*, clang::DeclContext const*) ()
> from 
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> #6  0x751a092a in
> clang::index::IndexingContext::handleReference(clang::NamedDecl
> const*, clang::SourceLocation, clang::NamedDecl const*,
> clang::DeclContext const*, unsigned int,
> llvm::ArrayRef, clang::Expr const*,
> clang::Decl const*) ()
> from 
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> #7  0x751a9256 in clang::RecursiveASTVisitor<(anonymous
> namespace)::BodyIndexer>::dataTraverseNode(clang::Stmt*,
> llvm::SmallVectorImpl llvm::PointerLikeTypeTraits,
> llvm::PointerIntPairInfo llvm::PointerLikeTypeTraits > > >*) () from
> /usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
> #8  0x751a60dd in clang::RecursiveASTVisitor<(anonymous
> namespace)::BodyIndexer>::TraverseStmt(clang::Stmt*,
> 

Re: [PATCH] D17581: [CUDA] disable attribute-based overloading for __global__ functions.

2016-02-29 Thread Justin Lebar via cfe-commits
jlebar closed this revision.
jlebar added a comment.

Landed in http://reviews.llvm.org/rL261778.


http://reviews.llvm.org/D17581



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: r260760 - [libclang] Separate the underlying indexing functionality of libclang and introduce it into the clangIndex library.

2016-02-29 Thread Justin Lebar via cfe-commits
> Is this still an issue after r260785 ?

I just sync'ed to r262268 and was able to reproduce the segfault.

> Could you provide a stack trace ?

$ gdb --args release/bin/c-index-test -index-file
/usr/local/google/home/jlebar/code/llvm/src/tools/clang/test/Index/index-many-call-ops.cpp
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x72ca2700 (LWP 3936)]
0x74f264be in
clang::DeclarationName::print(llvm::raw_ostream&,
clang::PrintingPolicy const&) ()
   from 
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
(gdb) bt
#0  0x74f264be in
clang::DeclarationName::print(llvm::raw_ostream&,
clang::PrintingPolicy const&) ()
   from 
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#1  0x74f26c9e in clang::operator<<(llvm::raw_ostream&,
clang::DeclarationName) ()
   from 
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#2  0x74ebcb91 in
clang::cxindex::CXIndexDataConsumer::getEntityInfo(clang::NamedDecl
const*, clang::cxindex::EntityInfo&, clang::cxindex::ScratchAlloc&) ()
from 
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#3  0x74ebc206 in
clang::cxindex::CXIndexDataConsumer::handleReference(clang::NamedDecl
const*, clang::SourceLocation, CXCursor, clang::NamedDecl const*,
clang::DeclContext const*, clang::Expr const*, CXIdxEntityRefKind) ()
   from 
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#4  0x74ebbae2 in
clang::cxindex::CXIndexDataConsumer::handleDeclOccurence(clang::Decl
const*, unsigned int, llvm::ArrayRef,
clang::FileID, unsigned int,
clang::index::IndexDataConsumer::ASTNodeInfo) ()
   from 
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#5  0x751a05c5 in
clang::index::IndexingContext::handleDeclOccurrence(clang::Decl
const*, clang::SourceLocation, bool, clang::Decl const*, unsigned int,
llvm::ArrayRef, clang::Expr const*,
clang::Decl const*, clang::DeclContext const*) ()
   from 
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#6  0x751a092a in
clang::index::IndexingContext::handleReference(clang::NamedDecl
const*, clang::SourceLocation, clang::NamedDecl const*,
clang::DeclContext const*, unsigned int,
llvm::ArrayRef, clang::Expr const*,
clang::Decl const*) ()
   from 
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#7  0x751a9256 in clang::RecursiveASTVisitor<(anonymous
namespace)::BodyIndexer>::dataTraverseNode(clang::Stmt*,
llvm::SmallVectorImpl,
llvm::PointerIntPairInfo > > >*) () from
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#8  0x751a60dd in clang::RecursiveASTVisitor<(anonymous
namespace)::BodyIndexer>::TraverseStmt(clang::Stmt*,
llvm::SmallVectorImpl,
llvm::PointerIntPairInfo > > >*) () from
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#9  0x751a880e in clang::RecursiveASTVisitor<(anonymous
namespace)::BodyIndexer>::dataTraverseNode(clang::Stmt*,
llvm::SmallVectorImpl,
llvm::PointerIntPairInfo > > >*) () from
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#10 0x751a60dd in clang::RecursiveASTVisitor<(anonymous
namespace)::BodyIndexer>::TraverseStmt(clang::Stmt*,
llvm::SmallVectorImpl,
llvm::PointerIntPairInfo > > >*) () from
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#11 0x751a880e in clang::RecursiveASTVisitor<(anonymous
namespace)::BodyIndexer>::dataTraverseNode(clang::Stmt*,
llvm::SmallVectorImpl,
llvm::PointerIntPairInfo > > >*) () from
/usr/local/google/home/jlebar/code/llvm-complete/release/bin/../lib/libclang.so.3.9
#12 0x751a60dd in clang::RecursiveASTVisitor<(anonymous
namespace)::BodyIndexer>::TraverseStmt(clang::Stmt*,
llvm::SmallVectorImpl,
llvm::PointerIntPairInfo > > >*) () from

Re: r260760 - [libclang] Separate the underlying indexing functionality of libclang and introduce it into the clangIndex library.

2016-02-29 Thread Justin Lebar via cfe-commits
Hi, I think this broke clang/test/Index/index-many-call-ops.cpp.  I
get a segfault due to recursive stack overflow in release builds on my
linux x86-64 box when running that test -- it seems the purpose of
that test is to check that we *don't* segfault.

This has been broken for a while, and people don't seem to be
complaining -- I'm not sure if there's something specific about my
config or if I'm just the first person to bother bisecting this...
Maybe the test just needs to be pared down some?

Please advise.

Regards,
-Justin

On Sat, Feb 13, 2016 at 12:47 PM, Argyrios Kyrtzidis via cfe-commits
 wrote:
> I guess refreshing the build directory fixed the bots now.
> You may want to look into ccache as possibly the issue here.
>
>> On Feb 13, 2016, at 12:08 PM, Argyrios Kyrtzidis  wrote:
>>
>> clangIndex library is not part of that command so I don’t understand how my 
>> changes affect linking the clang binary, clangIndex is only used for 
>> libclang.
>>
>>> On Feb 13, 2016, at 11:55 AM, Argyrios Kyrtzidis  wrote:
>>>
>>> Sorry, I looked at it but it wasn’t clear to me what the problem is:
>>>
>>> FAILED: : 
>>> /home/linaro/buildbot/clang-cmake-armv7-a15/stage1/tools/clang/clang.order: 
>>> file not recognized: File truncated
>>>
>>> I thought it was some build SNAFU, do you have some hint on what I need to 
>>> do to fix this ?
>>>
 On Feb 13, 2016, at 7:48 AM, Renato Golin  wrote:

 On 12 February 2016 at 23:11, Argyrios Kyrtzidis via cfe-commits
  wrote:
> Author: akirtzidis
> Date: Fri Feb 12 17:10:59 2016
> New Revision: 260760
>
> URL: http://llvm.org/viewvc/llvm-project?rev=260760=rev
> Log:
> [libclang] Separate the underlying indexing functionality of libclang and 
> introduce it into the clangIndex library.
>
> It is a general goodness for libclang itself to mostly be a wrapper of 
> functionality provided by the libraries.

 Hi,

 This broke both ARM builds:

 http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/9853

 http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15/builds/9806

 And it's still broken. I'm going to refresh the build directory, but
 please keep an eye or revert if that doesn't work.

 cheers,
 --renato
>>>
>>
>
> ___
> cfe-commits mailing list
> cfe-commits@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: r261774 - Bail on compilation as soon as a job fails.

2016-02-29 Thread Justin Lebar via cfe-commits
I think the reason this becomes tricky is that you can reuse Action
nodes for multiple invocations of cc1 (or your favorite tool).  We do
this on macos with bindarch for universal binaries.

So you can't necessarily point to one CompilationAction node and say
"*that* failed with an error", with no ambiguity.

On Mon, Feb 29, 2016 at 12:50 PM, David Blaikie <dblai...@gmail.com> wrote:
>
>
> On Mon, Feb 29, 2016 at 11:19 AM, Justin Lebar via cfe-commits
> <cfe-commits@lists.llvm.org> wrote:
>>
>> > Can you expand on this? The idea is that those implicit deps are in
>> > addition to regular dependencies, not a replacement.
>>
>> Sure.
>>
>> What I was trying to say is, suppose you have two CUDA device
>> compilations.  You want device compilation B to bail if device
>> compilation A fails.  So what are the implicit dependencies of B?
>>
>> Well, B must implicitly depend on everything in A that may trigger a
>> failure.  Which is, I recall, four actions: preprocessor, compilation,
>> backend, and assemble.  It may in fact be one or two more because we
>> have these CUDADeviceActions that wrap other actions, and maybe we'd
>> have to depend on those too.
>
>
> Yeah, that doesn't sound quite right - presumably our link action doesn't
> depend on all those steps, just the last one. If any others fail, the rest
> can't run (because they depend on them in turn) so the link won't run.
>
> (but, yeah, I'm no Driver expert by any means..)
>
>>
>>
>> Or maybe the thing you're proposing is smart enough that we only need
>> to depend on the assemble action and it will infer the rest correctly.
>> Either way, I was just saying that if you add another action, C, it
>> now has to depend on everything from A and B, and so on.
>>
>> > One problem I have with this change is that it breaks part of my patch
>> > ;-)
>>
>> Now it's my turn to admit that I didn't quite follow the rest of this.  :)
>>
>> > if you're in the office tomorrow, maybe we could discuss this for half
>> > an hour or so in person?
>>
>> Yes, absolutely, let's talk irl and figure this out.
>>
>> On Mon, Feb 29, 2016 at 10:43 AM, Nico Weber <tha...@chromium.org> wrote:
>> > On Sun, Feb 28, 2016 at 3:40 PM, Justin Lebar <jle...@google.com> wrote:
>> >>
>> >> On Sun, Feb 28, 2016 at 1:46 PM, Nico Weber <tha...@chromium.org>
>> >> wrote:
>> >> > Do you think something like the implicit inputs thing in
>> >> > http://reviews.llvm.org/D17695 could work for you as well, instead of
>> >> > this
>> >> > patch?
>> >>
>> >> Having read just the patch description, I think it would be workable,
>> >> although it might not be particularly clean.  I think you'd have to
>> >> make most (all?) of the intermediate actions associated with a
>> >> compilation be implicit inputs to all later actions in all other
>> >> compilations.  Otherwise things like saving vs. not saving temps would
>> >> give you different behavior.
>> >
>> >
>> > Can you expand on this? The idea is that those implicit deps are in
>> > addition
>> > to regular dependencies, not a replacement.
>> >
>> >>
>> >> It seems like kind of a fragile way to get this behavior.
>> >>
>> >> > Then we don't have to forever promise to compile all .cc input files
>> >> > serially.
>> >>
>> >> I've thought about this some, since we don't really want this even for
>> >> CUDA.  With CUDA, you commonly want to compile the device code for
>> >> different architectures, and it would be reasonable to do those
>> >> compilations in parallel.
>> >>
>> >> What I think may be a reasonable promise, at least for CUDA, is that
>> >> we will behave *as if* we're compiling in series.  Which just means
>> >> not interleaving diagnostics from different compilations, and not
>> >> showing diagnostics from other subcompilations after we hit an error.
>> >>
>> >> I think basically nobody wants randomly-interleaved diagnostics.
>> >> Whether or there's a good use-case where we continue after we hit an
>> >> error is a harder question, I'm not sure.  But in any case, stopping
>> >> after an error shouldn't mean we're forced to serialize, I think.
>> >
>> >
>> > One problem I have with this change is that it breaks part of my patch
>> > ;-)
&g

Re: r261774 - Bail on compilation as soon as a job fails.

2016-02-29 Thread Justin Lebar via cfe-commits
> Can you expand on this? The idea is that those implicit deps are in addition 
> to regular dependencies, not a replacement.

Sure.

What I was trying to say is, suppose you have two CUDA device
compilations.  You want device compilation B to bail if device
compilation A fails.  So what are the implicit dependencies of B?

Well, B must implicitly depend on everything in A that may trigger a
failure.  Which is, I recall, four actions: preprocessor, compilation,
backend, and assemble.  It may in fact be one or two more because we
have these CUDADeviceActions that wrap other actions, and maybe we'd
have to depend on those too.

Or maybe the thing you're proposing is smart enough that we only need
to depend on the assemble action and it will infer the rest correctly.
Either way, I was just saying that if you add another action, C, it
now has to depend on everything from A and B, and so on.

> One problem I have with this change is that it breaks part of my patch ;-)

Now it's my turn to admit that I didn't quite follow the rest of this.  :)

> if you're in the office tomorrow, maybe we could discuss this for half an 
> hour or so in person?

Yes, absolutely, let's talk irl and figure this out.

On Mon, Feb 29, 2016 at 10:43 AM, Nico Weber <tha...@chromium.org> wrote:
> On Sun, Feb 28, 2016 at 3:40 PM, Justin Lebar <jle...@google.com> wrote:
>>
>> On Sun, Feb 28, 2016 at 1:46 PM, Nico Weber <tha...@chromium.org> wrote:
>> > Do you think something like the implicit inputs thing in
>> > http://reviews.llvm.org/D17695 could work for you as well, instead of
>> > this
>> > patch?
>>
>> Having read just the patch description, I think it would be workable,
>> although it might not be particularly clean.  I think you'd have to
>> make most (all?) of the intermediate actions associated with a
>> compilation be implicit inputs to all later actions in all other
>> compilations.  Otherwise things like saving vs. not saving temps would
>> give you different behavior.
>
>
> Can you expand on this? The idea is that those implicit deps are in addition
> to regular dependencies, not a replacement.
>
>>
>> It seems like kind of a fragile way to get this behavior.
>>
>> > Then we don't have to forever promise to compile all .cc input files
>> > serially.
>>
>> I've thought about this some, since we don't really want this even for
>> CUDA.  With CUDA, you commonly want to compile the device code for
>> different architectures, and it would be reasonable to do those
>> compilations in parallel.
>>
>> What I think may be a reasonable promise, at least for CUDA, is that
>> we will behave *as if* we're compiling in series.  Which just means
>> not interleaving diagnostics from different compilations, and not
>> showing diagnostics from other subcompilations after we hit an error.
>>
>> I think basically nobody wants randomly-interleaved diagnostics.
>> Whether or there's a good use-case where we continue after we hit an
>> error is a harder question, I'm not sure.  But in any case, stopping
>> after an error shouldn't mean we're forced to serialize, I think.
>
>
> One problem I have with this change is that it breaks part of my patch ;-)
> I'd like to run two commands, one to build a pch, and another to compile a
> source file using the just-built pch, and if the first fails I don't want to
> run the second: `make_pch && compile`. This part still works after your
> patch. But we also have a flag /fallback that says "if compilation falls,
> try again with this other compiler. Ideally I want `(make_pch && compile) ||
> fallback_compile`, but since it's a bit of a corner case and arguably good
> enough, my patch did `make_pch ; compile || fallback`. Your change makes it
> impossible to run commands after one another if one fails, so now this is
> `make_pch && (compile || fallback)`, i.e. if compilation of the pch fails
> the fallback compiler won't be invoked.
>
> I can try looking at maybe making FallbackCommand a FallbackAction instead
> or something. But since CUDA seems to not fit the internal model of the
> driver super well (lots of isCUDA() calls in many places), maybe it'd make
> sense to discuss what your requirements are and if it's possible to extend
> the Action / Job / Command abstractions in a way that support CUDA without
> as many special cases. I happen to be in Mountain View – if you're in the
> office tomorrow, maybe we could discuss this for half an hour or so in
> person?
>
>>
>>
>> -Justin
>>
>> > On Wed, Feb 24, 2016 at 4:49 PM, Justin Lebar via cfe-commits
>> > <cfe-commits@lists.llvm.o

Re: r261774 - Bail on compilation as soon as a job fails.

2016-02-28 Thread Justin Lebar via cfe-commits
On Sun, Feb 28, 2016 at 1:46 PM, Nico Weber <tha...@chromium.org> wrote:
> Do you think something like the implicit inputs thing in
> http://reviews.llvm.org/D17695 could work for you as well, instead of this
> patch?

Having read just the patch description, I think it would be workable,
although it might not be particularly clean.  I think you'd have to
make most (all?) of the intermediate actions associated with a
compilation be implicit inputs to all later actions in all other
compilations.  Otherwise things like saving vs. not saving temps would
give you different behavior.

It seems like kind of a fragile way to get this behavior.

> Then we don't have to forever promise to compile all .cc input files
> serially.

I've thought about this some, since we don't really want this even for
CUDA.  With CUDA, you commonly want to compile the device code for
different architectures, and it would be reasonable to do those
compilations in parallel.

What I think may be a reasonable promise, at least for CUDA, is that
we will behave *as if* we're compiling in series.  Which just means
not interleaving diagnostics from different compilations, and not
showing diagnostics from other subcompilations after we hit an error.

I think basically nobody wants randomly-interleaved diagnostics.
Whether or there's a good use-case where we continue after we hit an
error is a harder question, I'm not sure.  But in any case, stopping
after an error shouldn't mean we're forced to serialize, I think.

-Justin

> On Wed, Feb 24, 2016 at 4:49 PM, Justin Lebar via cfe-commits
> <cfe-commits@lists.llvm.org> wrote:
>>
>> Author: jlebar
>> Date: Wed Feb 24 15:49:28 2016
>> New Revision: 261774
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=261774=rev
>> Log:
>> Bail on compilation as soon as a job fails.
>>
>> Summary:
>> (Re-land of r260448, which was reverted in r260522 due to a test failure
>> in Driver/output-file-cleanup.c that only showed up in fresh builds.)
>>
>> Previously we attempted to be smart; if one job failed, we'd run all
>> jobs that didn't depend on the failing job.
>>
>> Problem is, this doesn't work well for e.g. CUDA compilation without
>> -save-temps.  In this case, the device-side and host-side Assemble
>> actions (which actually are responsible for preprocess, compile,
>> backend, and assemble, since we're not saving temps) are necessarily
>> distinct.  So our clever heuristic doesn't help us, and we repeat every
>> error message once for host and once for each device arch.
>>
>> The main effect of this change, other than fixing CUDA, is that if you
>> pass multiple cc files to one instance of clang and you get a compile
>> error, we'll stop when the first cc1 job fails.
>>
>> Reviewers: echristo
>>
>> Subscribers: cfe-commits, jhen, echristo, tra, rafael
>>
>> Differential Revision: http://reviews.llvm.org/D17217
>>
>> Modified:
>> cfe/trunk/lib/Driver/Compilation.cpp
>> cfe/trunk/test/Driver/output-file-cleanup.c
>>
>> Modified: cfe/trunk/lib/Driver/Compilation.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Compilation.cpp?rev=261774=261773=261774=diff
>>
>> ==
>> --- cfe/trunk/lib/Driver/Compilation.cpp (original)
>> +++ cfe/trunk/lib/Driver/Compilation.cpp Wed Feb 24 15:49:28 2016
>> @@ -163,39 +163,17 @@ int Compilation::ExecuteCommand(const Co
>>return ExecutionFailed ? 1 : Res;
>>  }
>>
>> -typedef SmallVectorImpl< std::pair<int, const Command *> >
>> FailingCommandList;
>> -
>> -static bool ActionFailed(const Action *A,
>> - const FailingCommandList ) {
>> -
>> -  if (FailingCommands.empty())
>> -return false;
>> -
>> -  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
>> - CE = FailingCommands.end(); CI != CE; ++CI)
>> -if (A == &(CI->second->getSource()))
>> -  return true;
>> -
>> -  for (const Action *AI : A->inputs())
>> -if (ActionFailed(AI, FailingCommands))
>> -  return true;
>> -
>> -  return false;
>> -}
>> -
>> -static bool InputsOk(const Command ,
>> - const FailingCommandList ) {
>> -  return !ActionFailed((), FailingCommands);
>> -}
>> -
>> -void Compilation::ExecuteJobs(const JobList ,
>> -  FailingCommandList ) const
>> {
>> +void Compilation::ExecuteJobs(
>> +const JobList ,
>> +SmallVectorImpl<std

Re: [PATCH] D17313: [CUDA] Annotate all calls in CUDA device mode as convergent.

2016-02-24 Thread Justin Lebar via cfe-commits
jlebar abandoned this revision.
jlebar added a comment.

Subsumed by http://reviews.llvm.org/D17056.


http://reviews.llvm.org/D17313



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r261779 - [CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.

2016-02-24 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 24 15:55:11 2016
New Revision: 261779

URL: http://llvm.org/viewvc/llvm-project?rev=261779=rev
Log:
[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent.

Summary:
This is important for e.g. the following case:

  void sync() { __syncthreads(); }
  void foo() {
do_something();
sync();
do_something_else():
  }

Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.

Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.

This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.

Reviewers: majnemer

Subscribers: cfe-commits, jhen, echristo, tra

Differential Revision: http://reviews.llvm.org/D17056

Added:
cfe/trunk/test/CodeGenCUDA/convergent.cu
Modified:
cfe/trunk/lib/CodeGen/CGCall.cpp
cfe/trunk/test/CodeGenCUDA/device-var-init.cu

Modified: cfe/trunk/lib/CodeGen/CGCall.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCall.cpp?rev=261779=261778=261779=diff
==
--- cfe/trunk/lib/CodeGen/CGCall.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGCall.cpp Wed Feb 24 15:55:11 2016
@@ -1595,6 +1595,14 @@ void CodeGenModule::ConstructAttributeLi
 }
   }
 
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions and calls in CUDA as convergent
+// (meaning, they may call an intrinsically convergent op, such as
+// __syncthreads(), and so can't have certain optimizations applied around
+// them).  LLVM will remove this attribute where it safely can.
+FuncAttrs.addAttribute(llvm::Attribute::Convergent);
+  }
+
   ClangToLLVMArgMapping IRFunctionArgs(getContext(), FI);
 
   QualType RetTy = FI.getReturnType();

Added: cfe/trunk/test/CodeGenCUDA/convergent.cu
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/convergent.cu?rev=261779=auto
==
--- cfe/trunk/test/CodeGenCUDA/convergent.cu (added)
+++ cfe/trunk/test/CodeGenCUDA/convergent.cu Wed Feb 24 15:55:11 2016
@@ -0,0 +1,39 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() {
+  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
+  baz();
+}
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+// DEVICE: attributes [[CALL_ATTR]] = { convergent }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }

Modified: cfe/trunk/test/CodeGenCUDA/device-var-init.cu
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/device-var-init.cu?rev=261779=261778=261779=diff
==
--- cfe/trunk/test/CodeGenCUDA/device-var-init.cu (original)
+++ cfe/trunk/test/CodeGenCUDA/device-var-init.cu Wed Feb 24 15:55:11 2016
@@ -382,7 +382,7 @@ __device__ void df() {
 // CHECK:   call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
 // CHECK:   call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
 // CHECK:   call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
-// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t) #3
+// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)
 // CHECK:   call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
 // CHECK:   call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
 // CHECK:   call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-24 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL261779: [CUDA] Mark all CUDA device-side function defs, 
decls, and calls as convergent. (authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D17056?vs=48261=48979#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D17056

Files:
  cfe/trunk/lib/CodeGen/CGCall.cpp
  cfe/trunk/test/CodeGenCUDA/convergent.cu
  cfe/trunk/test/CodeGenCUDA/device-var-init.cu

Index: cfe/trunk/lib/CodeGen/CGCall.cpp
===
--- cfe/trunk/lib/CodeGen/CGCall.cpp
+++ cfe/trunk/lib/CodeGen/CGCall.cpp
@@ -1595,6 +1595,14 @@
 }
   }
 
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions and calls in CUDA as convergent
+// (meaning, they may call an intrinsically convergent op, such as
+// __syncthreads(), and so can't have certain optimizations applied around
+// them).  LLVM will remove this attribute where it safely can.
+FuncAttrs.addAttribute(llvm::Attribute::Convergent);
+  }
+
   ClangToLLVMArgMapping IRFunctionArgs(getContext(), FI);
 
   QualType RetTy = FI.getReturnType();
Index: cfe/trunk/test/CodeGenCUDA/convergent.cu
===
--- cfe/trunk/test/CodeGenCUDA/convergent.cu
+++ cfe/trunk/test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,39 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() {
+  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
+  baz();
+}
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+// DEVICE: attributes [[CALL_ATTR]] = { convergent }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: cfe/trunk/test/CodeGenCUDA/device-var-init.cu
===
--- cfe/trunk/test/CodeGenCUDA/device-var-init.cu
+++ cfe/trunk/test/CodeGenCUDA/device-var-init.cu
@@ -382,7 +382,7 @@
 // CHECK:   call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
 // CHECK:   call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
 // CHECK:   call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
-// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t) #3
+// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)
 // CHECK:   call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
 // CHECK:   call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
 // CHECK:   call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)


Index: cfe/trunk/lib/CodeGen/CGCall.cpp
===
--- cfe/trunk/lib/CodeGen/CGCall.cpp
+++ cfe/trunk/lib/CodeGen/CGCall.cpp
@@ -1595,6 +1595,14 @@
 }
   }
 
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions and calls in CUDA as convergent
+// (meaning, they may call an intrinsically convergent op, such as
+// __syncthreads(), and so can't have certain optimizations applied around
+// them).  LLVM will remove this attribute where it safely can.
+FuncAttrs.addAttribute(llvm::Attribute::Convergent);
+  }
+
   ClangToLLVMArgMapping IRFunctionArgs(getContext(), FI);
 
   QualType RetTy = FI.getReturnType();
Index: cfe/trunk/test/CodeGenCUDA/convergent.cu
===
--- cfe/trunk/test/CodeGenCUDA/convergent.cu
+++ cfe/trunk/test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,39 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: 

Re: [PATCH] D17561: [CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim to uint3 and dim3.

2016-02-24 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL261777: [CUDA] Add conversion operators for threadIdx, 
blockIdx, gridDim, and… (authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D17561?vs=48884=48978#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D17561

Files:
  cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
  cfe/trunk/lib/Headers/cuda_builtin_vars.h

Index: cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
===
--- cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
+++ cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -245,6 +245,33 @@
 }
 } // namespace std
 
+// Out-of-line implementations from cuda_builtin_vars.h.  These need to come
+// after we've pulled in the definition of uint3 and dim3.
+
+__device__ inline __cuda_builtin_threadIdx_t::operator uint3() const {
+  uint3 ret;
+  ret.x = x;
+  ret.y = y;
+  ret.z = z;
+  return ret;
+}
+
+__device__ inline __cuda_builtin_blockIdx_t::operator uint3() const {
+  uint3 ret;
+  ret.x = x;
+  ret.y = y;
+  ret.z = z;
+  return ret;
+}
+
+__device__ inline __cuda_builtin_blockDim_t::operator dim3() const {
+  return dim3(x, y, z);
+}
+
+__device__ inline __cuda_builtin_gridDim_t::operator dim3() const {
+  return dim3(x, y, z);
+}
+
 #include <__clang_cuda_cmath.h>
 
 // curand_mtgp32_kernel helpfully redeclares blockDim and threadIdx in host
Index: cfe/trunk/lib/Headers/cuda_builtin_vars.h
===
--- cfe/trunk/lib/Headers/cuda_builtin_vars.h
+++ cfe/trunk/lib/Headers/cuda_builtin_vars.h
@@ -24,10 +24,14 @@
 #ifndef __CUDA_BUILTIN_VARS_H
 #define __CUDA_BUILTIN_VARS_H
 
+// Forward declares from vector_types.h.
+struct uint3;
+struct dim3;
+
 // The file implements built-in CUDA variables using __declspec(property).
 // https://msdn.microsoft.com/en-us/library/yhfk0thd.aspx
 // All read accesses of built-in variable fields get converted into calls to a
-// getter function which in turn would call appropriate builtin to fetch the
+// getter function which in turn calls the appropriate builtin to fetch the
 // value.
 //
 // Example:
@@ -63,30 +67,42 @@
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_tid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_tid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_tid_z());
+  // threadIdx should be convertible to uint3 (in fact in nvcc, it *is* a
+  // uint3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator uint3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_threadIdx_t);
 };
 
 struct __cuda_builtin_blockIdx_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_ctaid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_ctaid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_ctaid_z());
+  // blockIdx should be convertible to uint3 (in fact in nvcc, it *is* a
+  // uint3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator uint3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_blockIdx_t);
 };
 
 struct __cuda_builtin_blockDim_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_ntid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_ntid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_ntid_z());
+  // blockDim should be convertible to dim3 (in fact in nvcc, it *is* a
+  // dim3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator dim3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_blockDim_t);
 };
 
 struct __cuda_builtin_gridDim_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_nctaid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_nctaid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_nctaid_z());
+  // gridDim should be convertible to dim3 (in fact in nvcc, it *is* a
+  // dim3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator dim3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_gridDim_t);
 };
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17216: Make test/Driver/output-file-cleanup.c hermetic.

2016-02-24 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL261773: Make test/Driver/output-file-cleanup.c hermetic. 
(authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D17216?vs=47857=48975#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D17216

Files:
  cfe/trunk/test/Driver/output-file-cleanup.c

Index: cfe/trunk/test/Driver/output-file-cleanup.c
===
--- cfe/trunk/test/Driver/output-file-cleanup.c
+++ cfe/trunk/test/Driver/output-file-cleanup.c
@@ -1,3 +1,5 @@
+// RUN: rm -f "%t.d" "%t1.s" "%t2.s" "%t3.s" "%t4.s" "%t5.s"
+//
 // RUN: touch %t.s
 // RUN: not %clang -S -DCRASH -o %t.s -MMD -MF %t.d %s
 // RUN: test ! -f %t.s


Index: cfe/trunk/test/Driver/output-file-cleanup.c
===
--- cfe/trunk/test/Driver/output-file-cleanup.c
+++ cfe/trunk/test/Driver/output-file-cleanup.c
@@ -1,3 +1,5 @@
+// RUN: rm -f "%t.d" "%t1.s" "%t2.s" "%t3.s" "%t4.s" "%t5.s"
+//
 // RUN: touch %t.s
 // RUN: not %clang -S -DCRASH -o %t.s -MMD -MF %t.d %s
 // RUN: test ! -f %t.s
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17217: Bail on compilation as soon as a job fails.

2016-02-24 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL261774: Bail on compilation as soon as a job fails. 
(authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D17217?vs=47858=48976#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D17217

Files:
  cfe/trunk/lib/Driver/Compilation.cpp
  cfe/trunk/test/Driver/output-file-cleanup.c

Index: cfe/trunk/lib/Driver/Compilation.cpp
===
--- cfe/trunk/lib/Driver/Compilation.cpp
+++ cfe/trunk/lib/Driver/Compilation.cpp
@@ -163,39 +163,17 @@
   return ExecutionFailed ? 1 : Res;
 }
 
-typedef SmallVectorImpl< std::pair > FailingCommandList;
-
-static bool ActionFailed(const Action *A,
- const FailingCommandList ) {
-
-  if (FailingCommands.empty())
-return false;
-
-  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
- CE = FailingCommands.end(); CI != CE; ++CI)
-if (A == &(CI->second->getSource()))
-  return true;
-
-  for (const Action *AI : A->inputs())
-if (ActionFailed(AI, FailingCommands))
-  return true;
-
-  return false;
-}
-
-static bool InputsOk(const Command ,
- const FailingCommandList ) {
-  return !ActionFailed((), FailingCommands);
-}
-
-void Compilation::ExecuteJobs(const JobList ,
-  FailingCommandList ) const {
+void Compilation::ExecuteJobs(
+const JobList ,
+SmallVectorImpl> ) const {
   for (const auto  : Jobs) {
-if (!InputsOk(Job, FailingCommands))
-  continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  // Bail as soon as one command fails, so we don't output duplicate error
+  // messages if we die on e.g. the same file.
+  return;
+}
   }
 }
 
Index: cfe/trunk/test/Driver/output-file-cleanup.c
===
--- cfe/trunk/test/Driver/output-file-cleanup.c
+++ cfe/trunk/test/Driver/output-file-cleanup.c
@@ -38,14 +38,17 @@
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
 
+// When given multiple .c files to compile, clang compiles them in order until
+// it hits an error, at which point it stops.
+//
 // RUN: touch %t1.c
 // RUN: echo "invalid C code" > %t2.c
 // RUN: touch %t3.c
 // RUN: echo "invalid C code" > %t4.c
 // RUN: touch %t5.c
 // RUN: cd %T && not %clang -S %t1.c %t2.c %t3.c %t4.c %t5.c
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
-// RUN: test -f %t3.s
+// RUN: test ! -f %t3.s
 // RUN: test ! -f %t4.s
-// RUN: test -f %t5.s
+// RUN: test ! -f %t5.s


Index: cfe/trunk/lib/Driver/Compilation.cpp
===
--- cfe/trunk/lib/Driver/Compilation.cpp
+++ cfe/trunk/lib/Driver/Compilation.cpp
@@ -163,39 +163,17 @@
   return ExecutionFailed ? 1 : Res;
 }
 
-typedef SmallVectorImpl< std::pair > FailingCommandList;
-
-static bool ActionFailed(const Action *A,
- const FailingCommandList ) {
-
-  if (FailingCommands.empty())
-return false;
-
-  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
- CE = FailingCommands.end(); CI != CE; ++CI)
-if (A == &(CI->second->getSource()))
-  return true;
-
-  for (const Action *AI : A->inputs())
-if (ActionFailed(AI, FailingCommands))
-  return true;
-
-  return false;
-}
-
-static bool InputsOk(const Command ,
- const FailingCommandList ) {
-  return !ActionFailed((), FailingCommands);
-}
-
-void Compilation::ExecuteJobs(const JobList ,
-  FailingCommandList ) const {
+void Compilation::ExecuteJobs(
+const JobList ,
+SmallVectorImpl> ) const {
   for (const auto  : Jobs) {
-if (!InputsOk(Job, FailingCommands))
-  continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  // Bail as soon as one command fails, so we don't output duplicate error
+  // messages if we die on e.g. the same file.
+  return;
+}
   }
 }
 
Index: cfe/trunk/test/Driver/output-file-cleanup.c
===
--- cfe/trunk/test/Driver/output-file-cleanup.c
+++ cfe/trunk/test/Driver/output-file-cleanup.c
@@ -38,14 +38,17 @@
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
 
+// When given multiple .c files to compile, clang compiles them in order until
+// it hits an error, at which point it stops.
+//
 // RUN: touch %t1.c
 // RUN: echo "invalid C code" > %t2.c
 // RUN: touch 

Re: [PATCH] D17562: [CUDA] Add hack so code which includes "curand.h" doesn't break.

2016-02-24 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL261776: [CUDA] Add hack so code which includes "curand.h" 
doesn't break. (authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D17562?vs=48885=48977#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D17562

Files:
  cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h

Index: cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
===
--- cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
+++ cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -247,5 +247,19 @@
 
 #include <__clang_cuda_cmath.h>
 
+// curand_mtgp32_kernel helpfully redeclares blockDim and threadIdx in host
+// mode, giving them their "proper" types of dim3 and uint3.  This is
+// incompatible with the types we give in cuda_builtin_vars.h.  As as hack,
+// force-include the header (nvcc doesn't include it by default) but redefine
+// dim3 and uint3 to our builtin types.  (Thankfully dim3 and uint3 are only
+// used here for the redeclarations of blockDim and threadIdx.)
+#pragma push_macro("dim3")
+#pragma push_macro("uint3")
+#define dim3 __cuda_builtin_blockDim_t
+#define uint3 __cuda_builtin_threadIdx_t
+#include "curand_mtgp32_kernel.h"
+#pragma pop_macro("dim3")
+#pragma pop_macro("uint3")
+
 #endif // __CUDA__
 #endif // __CLANG_CUDA_RUNTIME_WRAPPER_H__


Index: cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
===
--- cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
+++ cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -247,5 +247,19 @@
 
 #include <__clang_cuda_cmath.h>
 
+// curand_mtgp32_kernel helpfully redeclares blockDim and threadIdx in host
+// mode, giving them their "proper" types of dim3 and uint3.  This is
+// incompatible with the types we give in cuda_builtin_vars.h.  As as hack,
+// force-include the header (nvcc doesn't include it by default) but redefine
+// dim3 and uint3 to our builtin types.  (Thankfully dim3 and uint3 are only
+// used here for the redeclarations of blockDim and threadIdx.)
+#pragma push_macro("dim3")
+#pragma push_macro("uint3")
+#define dim3 __cuda_builtin_blockDim_t
+#define uint3 __cuda_builtin_threadIdx_t
+#include "curand_mtgp32_kernel.h"
+#pragma pop_macro("dim3")
+#pragma pop_macro("uint3")
+
 #endif // __CUDA__
 #endif // __CLANG_CUDA_RUNTIME_WRAPPER_H__
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r261777 - [CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim to uint3 and dim3.

2016-02-24 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 24 15:49:33 2016
New Revision: 261777

URL: http://llvm.org/viewvc/llvm-project?rev=261777=rev
Log:
[CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim 
to uint3 and dim3.

Summary:
This lets you write, e.g.

  uint3 a = threadIdx;
  uint3 b = blockIdx;
  dim3 c = gridDim;
  dim3 d = blockDim;

which is legal in nvcc, but was not legal in clang.

The fact that e.g. the type of threadIdx is not actually uint3 is still
observable, but now you have to try to observe it.

Reviewers: tra

Subscribers: echristo, cfe-commits

Differential Revision: http://reviews.llvm.org/D17561

Modified:
cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
cfe/trunk/lib/Headers/cuda_builtin_vars.h

Modified: cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h?rev=261777=261776=261777=diff
==
--- cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h (original)
+++ cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h Wed Feb 24 15:49:33 
2016
@@ -245,6 +245,33 @@ __device__ static inline void *malloc(si
 }
 } // namespace std
 
+// Out-of-line implementations from cuda_builtin_vars.h.  These need to come
+// after we've pulled in the definition of uint3 and dim3.
+
+__device__ inline __cuda_builtin_threadIdx_t::operator uint3() const {
+  uint3 ret;
+  ret.x = x;
+  ret.y = y;
+  ret.z = z;
+  return ret;
+}
+
+__device__ inline __cuda_builtin_blockIdx_t::operator uint3() const {
+  uint3 ret;
+  ret.x = x;
+  ret.y = y;
+  ret.z = z;
+  return ret;
+}
+
+__device__ inline __cuda_builtin_blockDim_t::operator dim3() const {
+  return dim3(x, y, z);
+}
+
+__device__ inline __cuda_builtin_gridDim_t::operator dim3() const {
+  return dim3(x, y, z);
+}
+
 #include <__clang_cuda_cmath.h>
 
 // curand_mtgp32_kernel helpfully redeclares blockDim and threadIdx in host

Modified: cfe/trunk/lib/Headers/cuda_builtin_vars.h
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/cuda_builtin_vars.h?rev=261777=261776=261777=diff
==
--- cfe/trunk/lib/Headers/cuda_builtin_vars.h (original)
+++ cfe/trunk/lib/Headers/cuda_builtin_vars.h Wed Feb 24 15:49:33 2016
@@ -24,10 +24,14 @@
 #ifndef __CUDA_BUILTIN_VARS_H
 #define __CUDA_BUILTIN_VARS_H
 
+// Forward declares from vector_types.h.
+struct uint3;
+struct dim3;
+
 // The file implements built-in CUDA variables using __declspec(property).
 // https://msdn.microsoft.com/en-us/library/yhfk0thd.aspx
 // All read accesses of built-in variable fields get converted into calls to a
-// getter function which in turn would call appropriate builtin to fetch the
+// getter function which in turn calls the appropriate builtin to fetch the
 // value.
 //
 // Example:
@@ -63,6 +67,9 @@ struct __cuda_builtin_threadIdx_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_tid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_tid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_tid_z());
+  // threadIdx should be convertible to uint3 (in fact in nvcc, it *is* a
+  // uint3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator uint3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_threadIdx_t);
 };
@@ -71,6 +78,9 @@ struct __cuda_builtin_blockIdx_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_ctaid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_ctaid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_ctaid_z());
+  // blockIdx should be convertible to uint3 (in fact in nvcc, it *is* a
+  // uint3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator uint3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_blockIdx_t);
 };
@@ -79,6 +89,9 @@ struct __cuda_builtin_blockDim_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_ntid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_ntid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_ntid_z());
+  // blockDim should be convertible to dim3 (in fact in nvcc, it *is* a
+  // dim3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator dim3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_blockDim_t);
 };
@@ -87,6 +100,9 @@ struct __cuda_builtin_gridDim_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_nctaid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_nctaid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_nctaid_z());
+  // gridDim should be convertible to dim3 (in fact in nvcc, it *is* a
+  // dim3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator dim3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_gridDim_t);
 };


___
cfe-commits 

r261773 - Make test/Driver/output-file-cleanup.c hermetic.

2016-02-24 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 24 15:49:26 2016
New Revision: 261773

URL: http://llvm.org/viewvc/llvm-project?rev=261773=rev
Log:
Make test/Driver/output-file-cleanup.c hermetic.

Summary:
It checks that certain files do and exist, so make sure that they don't
exist at the beginning of the test.

This hid a failure in r260448; to see the failure, you had to run the test with
a clean-ish objdir.

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D17216

Modified:
cfe/trunk/test/Driver/output-file-cleanup.c

Modified: cfe/trunk/test/Driver/output-file-cleanup.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Driver/output-file-cleanup.c?rev=261773=261772=261773=diff
==
--- cfe/trunk/test/Driver/output-file-cleanup.c (original)
+++ cfe/trunk/test/Driver/output-file-cleanup.c Wed Feb 24 15:49:26 2016
@@ -1,3 +1,5 @@
+// RUN: rm -f "%t.d" "%t1.s" "%t2.s" "%t3.s" "%t4.s" "%t5.s"
+//
 // RUN: touch %t.s
 // RUN: not %clang -S -DCRASH -o %t.s -MMD -MF %t.d %s
 // RUN: test ! -f %t.s


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r261774 - Bail on compilation as soon as a job fails.

2016-02-24 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 24 15:49:28 2016
New Revision: 261774

URL: http://llvm.org/viewvc/llvm-project?rev=261774=rev
Log:
Bail on compilation as soon as a job fails.

Summary:
(Re-land of r260448, which was reverted in r260522 due to a test failure
in Driver/output-file-cleanup.c that only showed up in fresh builds.)

Previously we attempted to be smart; if one job failed, we'd run all
jobs that didn't depend on the failing job.

Problem is, this doesn't work well for e.g. CUDA compilation without
-save-temps.  In this case, the device-side and host-side Assemble
actions (which actually are responsible for preprocess, compile,
backend, and assemble, since we're not saving temps) are necessarily
distinct.  So our clever heuristic doesn't help us, and we repeat every
error message once for host and once for each device arch.

The main effect of this change, other than fixing CUDA, is that if you
pass multiple cc files to one instance of clang and you get a compile
error, we'll stop when the first cc1 job fails.

Reviewers: echristo

Subscribers: cfe-commits, jhen, echristo, tra, rafael

Differential Revision: http://reviews.llvm.org/D17217

Modified:
cfe/trunk/lib/Driver/Compilation.cpp
cfe/trunk/test/Driver/output-file-cleanup.c

Modified: cfe/trunk/lib/Driver/Compilation.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Compilation.cpp?rev=261774=261773=261774=diff
==
--- cfe/trunk/lib/Driver/Compilation.cpp (original)
+++ cfe/trunk/lib/Driver/Compilation.cpp Wed Feb 24 15:49:28 2016
@@ -163,39 +163,17 @@ int Compilation::ExecuteCommand(const Co
   return ExecutionFailed ? 1 : Res;
 }
 
-typedef SmallVectorImpl< std::pair > FailingCommandList;
-
-static bool ActionFailed(const Action *A,
- const FailingCommandList ) {
-
-  if (FailingCommands.empty())
-return false;
-
-  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
- CE = FailingCommands.end(); CI != CE; ++CI)
-if (A == &(CI->second->getSource()))
-  return true;
-
-  for (const Action *AI : A->inputs())
-if (ActionFailed(AI, FailingCommands))
-  return true;
-
-  return false;
-}
-
-static bool InputsOk(const Command ,
- const FailingCommandList ) {
-  return !ActionFailed((), FailingCommands);
-}
-
-void Compilation::ExecuteJobs(const JobList ,
-  FailingCommandList ) const {
+void Compilation::ExecuteJobs(
+const JobList ,
+SmallVectorImpl> ) const {
   for (const auto  : Jobs) {
-if (!InputsOk(Job, FailingCommands))
-  continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  // Bail as soon as one command fails, so we don't output duplicate error
+  // messages if we die on e.g. the same file.
+  return;
+}
   }
 }
 

Modified: cfe/trunk/test/Driver/output-file-cleanup.c
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Driver/output-file-cleanup.c?rev=261774=261773=261774=diff
==
--- cfe/trunk/test/Driver/output-file-cleanup.c (original)
+++ cfe/trunk/test/Driver/output-file-cleanup.c Wed Feb 24 15:49:28 2016
@@ -38,6 +38,9 @@ invalid C code
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
 
+// When given multiple .c files to compile, clang compiles them in order until
+// it hits an error, at which point it stops.
+//
 // RUN: touch %t1.c
 // RUN: echo "invalid C code" > %t2.c
 // RUN: touch %t3.c
@@ -46,6 +49,6 @@ invalid C code
 // RUN: cd %T && not %clang -S %t1.c %t2.c %t3.c %t4.c %t5.c
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
-// RUN: test -f %t3.s
+// RUN: test ! -f %t3.s
 // RUN: test ! -f %t4.s
-// RUN: test -f %t5.s
+// RUN: test ! -f %t5.s


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r261775 - [CUDA] Don't specify exact line numbers in cuda-builtin-vars.cu.

2016-02-24 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 24 15:49:30 2016
New Revision: 261775

URL: http://llvm.org/viewvc/llvm-project?rev=261775=rev
Log:
[CUDA] Don't specify exact line numbers in cuda-builtin-vars.cu.

This makes the test less fragile to changes to cuda_builtin_vars.h.

Test-only change.

Modified:
cfe/trunk/test/SemaCUDA/cuda-builtin-vars.cu

Modified: cfe/trunk/test/SemaCUDA/cuda-builtin-vars.cu
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/SemaCUDA/cuda-builtin-vars.cu?rev=261775=261774=261775=diff
==
--- cfe/trunk/test/SemaCUDA/cuda-builtin-vars.cu (original)
+++ cfe/trunk/test/SemaCUDA/cuda-builtin-vars.cu Wed Feb 24 15:49:30 2016
@@ -34,20 +34,20 @@ void kernel(int *out) {
 
   out[i++] = warpSize;
   warpSize = 0; // expected-error {{cannot assign to variable 'warpSize' with 
const-qualified type 'const int'}}
-  // expected-note@cuda_builtin_vars.h:104 {{variable 'warpSize' declared 
const here}}
+  // expected-note@cuda_builtin_vars.h:* {{variable 'warpSize' declared const 
here}}
 
   // Make sure we can't construct or assign to the special variables.
   __cuda_builtin_threadIdx_t x; // expected-error {{calling a private 
constructor of class '__cuda_builtin_threadIdx_t'}}
-  // expected-note@cuda_builtin_vars.h:67 {{declared private here}}
+  // expected-note@cuda_builtin_vars.h:* {{declared private here}}
 
   __cuda_builtin_threadIdx_t y = threadIdx; // expected-error {{calling a 
private constructor of class '__cuda_builtin_threadIdx_t'}}
-  // expected-note@cuda_builtin_vars.h:67 {{declared private here}}
+  // expected-note@cuda_builtin_vars.h:* {{declared private here}}
 
   threadIdx = threadIdx; // expected-error {{'operator=' is a private member 
of '__cuda_builtin_threadIdx_t'}}
-  // expected-note@cuda_builtin_vars.h:67 {{declared private here}}
+  // expected-note@cuda_builtin_vars.h:* {{declared private here}}
 
   void *ptr =  // expected-error {{'operator&' is a private member 
of '__cuda_builtin_threadIdx_t'}}
-  // expected-note@cuda_builtin_vars.h:67 {{declared private here}}
+  // expected-note@cuda_builtin_vars.h:* {{declared private here}}
 
   // Following line should've caused an error as one is not allowed to
   // take address of a built-in variable in CUDA. Alas there's no way


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r261776 - [CUDA] Add hack so code which includes "curand.h" doesn't break.

2016-02-24 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 24 15:49:31 2016
New Revision: 261776

URL: http://llvm.org/viewvc/llvm-project?rev=261776=rev
Log:
[CUDA] Add hack so code which includes "curand.h" doesn't break.

Summary:
curand.h includes curand_mtgp32_kernel.h.  In host mode, this header
redefines threadIdx and blockDim, giving them their "proper" types of
uint3 and dim3, respectively.

clang has its own plan for these variables -- their types are magic
builtin classes.  So these redefinitions are incompatible.

As a hack, we force-include the offending CUDA header and use #defines
to get the right types for threadIdx and blockDim.

Reviewers: tra

Subscribers: echristo, cfe-commits

Differential Revision: http://reviews.llvm.org/D17562

Modified:
cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h

Modified: cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h?rev=261776=261775=261776=diff
==
--- cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h (original)
+++ cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h Wed Feb 24 15:49:31 
2016
@@ -247,5 +247,19 @@ __device__ static inline void *malloc(si
 
 #include <__clang_cuda_cmath.h>
 
+// curand_mtgp32_kernel helpfully redeclares blockDim and threadIdx in host
+// mode, giving them their "proper" types of dim3 and uint3.  This is
+// incompatible with the types we give in cuda_builtin_vars.h.  As as hack,
+// force-include the header (nvcc doesn't include it by default) but redefine
+// dim3 and uint3 to our builtin types.  (Thankfully dim3 and uint3 are only
+// used here for the redeclarations of blockDim and threadIdx.)
+#pragma push_macro("dim3")
+#pragma push_macro("uint3")
+#define dim3 __cuda_builtin_blockDim_t
+#define uint3 __cuda_builtin_threadIdx_t
+#include "curand_mtgp32_kernel.h"
+#pragma pop_macro("dim3")
+#pragma pop_macro("uint3")
+
 #endif // __CUDA__
 #endif // __CLANG_CUDA_RUNTIME_WRAPPER_H__


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17217: Bail on compilation as soon as a job fails.

2016-02-24 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Upon further consideration, I'm going to push this with post-submit review -- 
the only difference from what echristo reviewed is an obvious change to 
output-file-cleanup.c.


http://reviews.llvm.org/D17217



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-24 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Friendly ping -- are we happy with this?


http://reviews.llvm.org/D17056



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17216: Make test/Driver/output-file-cleanup.c hermetic.

2016-02-24 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Upon further consideration, I think I'm going to push this with post-submit 
review, as we know this test is quite broken, and I think this change is 
unlikely to make the situation worse.


http://reviews.llvm.org/D17216



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17561: [CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim to uint3 and dim3.

2016-02-24 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/Headers/cuda_builtin_vars.h:72
@@ -66,1 +71,3 @@
+  // uint3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator uint3() const;
 private:

tra wrote:
> Considering that built-in variables are never instantiated, I wonder how it's 
> going to work as the operator will presumably need 'this' pointing 
> *somewhere*, even if we don't use it. Unused 'this' would probably get 
> optimized away with optimizations on, but -O0 may cause problems.
This is interesting.  In the ptx, threadIdx actually gets instantiated, as a 
non-weak global:

  .global .align 1 .b8 threadIdx[1];

Then we take the address of this thing.

At -O2, we don't emit a threadIdx global at all.

I think this is basically fine.  It's actually not right to change extern to 
static in the decl, because then we try to construct a 
__cuda_builtin_threadIdx_t, and the default constructor is deleted.  :)


http://reviews.llvm.org/D17561



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D17562: [CUDA] Add hack so code which includes "curand.h" doesn't break.

2016-02-23 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: tra.
jlebar added subscribers: cfe-commits, echristo.

curand.h includes curand_mtgp32_kernel.h.  In host mode, this header
redefines threadIdx and blockDim, giving them their "proper" types of
uint3 and dim3, respectively.

clang has its own plan for these variables -- their types are magic
builtin classes.  So these redefinitions are incompatible.

As a hack, we force-include the offending CUDA header and use #defines
to get the right types for threadIdx and blockDim.

http://reviews.llvm.org/D17562

Files:
  lib/Headers/__clang_cuda_runtime_wrapper.h

Index: lib/Headers/__clang_cuda_runtime_wrapper.h
===
--- lib/Headers/__clang_cuda_runtime_wrapper.h
+++ lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -274,5 +274,19 @@
 
 #include <__clang_cuda_cmath.h>
 
+// curand_mtgp32_kernel helpfully redeclares blockDim and threadIdx in host
+// mode, giving them their "proper" types of dim3 and uint3.  This is
+// incompatible with the types we give in cuda_builtin_vars.h.  As as hack,
+// force-include the header (nvcc doesn't include it by default) but redefine
+// dim3 and uint3 to our builtin types.  (Thankfully dim3 and uint3 are only
+// used here for the redeclarations of blockDim and threadIdx.)
+#pragma push_macro("dim3")
+#pragma push_macro("uint3")
+#define dim3 __cuda_builtin_blockDim_t
+#define uint3 __cuda_builtin_threadIdx_t
+#include "curand_mtgp32_kernel.h"
+#pragma pop_macro("dim3")
+#pragma pop_macro("uint3")
+
 #endif // __CUDA__
 #endif // __CLANG_CUDA_RUNTIME_WRAPPER_H__


Index: lib/Headers/__clang_cuda_runtime_wrapper.h
===
--- lib/Headers/__clang_cuda_runtime_wrapper.h
+++ lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -274,5 +274,19 @@
 
 #include <__clang_cuda_cmath.h>
 
+// curand_mtgp32_kernel helpfully redeclares blockDim and threadIdx in host
+// mode, giving them their "proper" types of dim3 and uint3.  This is
+// incompatible with the types we give in cuda_builtin_vars.h.  As as hack,
+// force-include the header (nvcc doesn't include it by default) but redefine
+// dim3 and uint3 to our builtin types.  (Thankfully dim3 and uint3 are only
+// used here for the redeclarations of blockDim and threadIdx.)
+#pragma push_macro("dim3")
+#pragma push_macro("uint3")
+#define dim3 __cuda_builtin_blockDim_t
+#define uint3 __cuda_builtin_threadIdx_t
+#include "curand_mtgp32_kernel.h"
+#pragma pop_macro("dim3")
+#pragma pop_macro("uint3")
+
 #endif // __CUDA__
 #endif // __CLANG_CUDA_RUNTIME_WRAPPER_H__
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D17561: [CUDA] Add conversion operators for threadIdx, blockIdx, gridDim, and blockDim to uint3 and dim3.

2016-02-23 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: tra.
jlebar added subscribers: cfe-commits, echristo.

This lets you write, e.g.

  uint3 a = threadIdx;
  uint3 b = blockIdx;
  dim3 c = gridDim;
  dim3 d = blockDim;

which is legal in nvcc, but was not legal in clang.

The fact that e.g. the type of threadIdx is not actually uint3 is still
observable, but now you have to try to observe it.

http://reviews.llvm.org/D17561

Files:
  lib/Headers/__clang_cuda_runtime_wrapper.h
  lib/Headers/cuda_builtin_vars.h

Index: lib/Headers/cuda_builtin_vars.h
===
--- lib/Headers/cuda_builtin_vars.h
+++ lib/Headers/cuda_builtin_vars.h
@@ -24,10 +24,14 @@
 #ifndef __CUDA_BUILTIN_VARS_H
 #define __CUDA_BUILTIN_VARS_H
 
+// Forward declares from vector_types.h.
+struct uint3;
+struct dim3;
+
 // The file implements built-in CUDA variables using __declspec(property).
 // https://msdn.microsoft.com/en-us/library/yhfk0thd.aspx
 // All read accesses of built-in variable fields get converted into calls to a
-// getter function which in turn would call appropriate builtin to fetch the
+// getter function which in turn calls the appropriate builtin to fetch the
 // value.
 //
 // Example:
@@ -63,30 +67,42 @@
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_tid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_tid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_tid_z());
+  // threadIdx should be convertible to uint3 (in fact in nvcc, it *is* a
+  // uint3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator uint3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_threadIdx_t);
 };
 
 struct __cuda_builtin_blockIdx_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_ctaid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_ctaid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_ctaid_z());
+  // blockIdx should be convertible to uint3 (in fact in nvcc, it *is* a
+  // uint3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator uint3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_blockIdx_t);
 };
 
 struct __cuda_builtin_blockDim_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_ntid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_ntid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_ntid_z());
+  // blockDim should be convertible to dim3 (in fact in nvcc, it *is* a
+  // dim3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator dim3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_blockDim_t);
 };
 
 struct __cuda_builtin_gridDim_t {
   __CUDA_DEVICE_BUILTIN(x,__builtin_ptx_read_nctaid_x());
   __CUDA_DEVICE_BUILTIN(y,__builtin_ptx_read_nctaid_y());
   __CUDA_DEVICE_BUILTIN(z,__builtin_ptx_read_nctaid_z());
+  // gridDim should be convertible to dim3 (in fact in nvcc, it *is* a
+  // dim3).  This function is defined after we pull in vector_types.h.
+  __attribute__((device)) operator dim3() const;
 private:
   __CUDA_DISALLOW_BUILTINVAR_ACCESS(__cuda_builtin_gridDim_t);
 };
Index: lib/Headers/__clang_cuda_runtime_wrapper.h
===
--- lib/Headers/__clang_cuda_runtime_wrapper.h
+++ lib/Headers/__clang_cuda_runtime_wrapper.h
@@ -245,6 +245,33 @@
 }
 } // namespace std
 
+// Out-of-line implementations from cuda_builtin_vars.h.  These need to come
+// after we've pulled in the definition of uint3 and dim3.
+
+__device__ inline __cuda_builtin_threadIdx_t::operator uint3() const {
+  uint3 ret;
+  ret.x = x;
+  ret.y = y;
+  ret.z = z;
+  return ret;
+}
+
+__device__ inline __cuda_builtin_blockIdx_t::operator uint3() const {
+  uint3 ret;
+  ret.x = x;
+  ret.y = y;
+  ret.z = z;
+  return ret;
+}
+
+__device__ inline __cuda_builtin_blockDim_t::operator dim3() const {
+  return dim3(x, y, z);
+}
+
+__device__ inline __cuda_builtin_gridDim_t::operator dim3() const {
+  return dim3(x, y, z);
+}
+
 #include <__clang_cuda_cmath.h>
 
 #endif // __CUDA__
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-17 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 48261.
jlebar added a comment.

Move code into ConstructAttributeList.  Now it applies to both functions and 
calls.


http://reviews.llvm.org/D17056

Files:
  lib/CodeGen/CGCall.cpp
  test/CodeGenCUDA/convergent.cu
  test/CodeGenCUDA/device-var-init.cu

Index: test/CodeGenCUDA/device-var-init.cu
===
--- test/CodeGenCUDA/device-var-init.cu
+++ test/CodeGenCUDA/device-var-init.cu
@@ -382,7 +382,7 @@
 // CHECK:   call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
 // CHECK:   call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
 // CHECK:   call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
-// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t) #3
+// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)
 // CHECK:   call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
 // CHECK:   call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
 // CHECK:   call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)
Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,39 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() {
+  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
+  baz();
+}
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+// DEVICE: attributes [[CALL_ATTR]] = { convergent }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CGCall.cpp
===
--- lib/CodeGen/CGCall.cpp
+++ lib/CodeGen/CGCall.cpp
@@ -1595,6 +1595,14 @@
 }
   }
 
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions and calls in CUDA as convergent
+// (meaning, they may call an intrinsically convergent op, such as
+// __syncthreads(), and so can't have certain optimizations applied around
+// them).  LLVM will remove this attribute where it safely can.
+FuncAttrs.addAttribute(llvm::Attribute::Convergent);
+  }
+
   ClangToLLVMArgMapping IRFunctionArgs(getContext(), FI);
 
   QualType RetTy = FI.getReturnType();


Index: test/CodeGenCUDA/device-var-init.cu
===
--- test/CodeGenCUDA/device-var-init.cu
+++ test/CodeGenCUDA/device-var-init.cu
@@ -382,7 +382,7 @@
 // CHECK:   call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
 // CHECK:   call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
 // CHECK:   call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
-// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t) #3
+// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)
 // CHECK:   call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
 // CHECK:   call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
 // CHECK:   call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)
Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,39 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() {
+  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
+  baz();
+}
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: 

Re: [PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-17 Thread Justin Lebar via cfe-commits
jlebar added a comment.

In http://reviews.llvm.org/D17056#355228, @jlebar wrote:

> Move coded into SetLLVMFunctionAttributes (not ForDefinition).


Much better.  :)


http://reviews.llvm.org/D17056



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-17 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 48260.
jlebar added a comment.

Move coded into SetLLVMFunctionAttributes (not ForDefinition).


http://reviews.llvm.org/D17056

Files:
  lib/CodeGen/CodeGenModule.cpp
  test/CodeGenCUDA/convergent.cu

Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,35 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() { baz(); }
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CodeGenModule.cpp
===
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -813,6 +813,14 @@
  false);
   F->setAttributes(llvm::AttributeSet::get(getLLVMContext(), AttributeList));
   F->setCallingConv(static_cast(CallingConv));
+
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions in CUDA as convergent (meaning, they
+// may call an intrinsically convergent op, such as __syncthreads(), and so
+// can't have certain optimizations applied around them).  LLVM will remove
+// this attribute where it safely can.
+F->addFnAttr(llvm::Attribute::Convergent);
+  }
 }
 
 /// Determines whether the language options require us to model


Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,35 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() { baz(); }
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CodeGenModule.cpp
===
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -813,6 +813,14 @@
  false);
   F->setAttributes(llvm::AttributeSet::get(getLLVMContext(), AttributeList));
   F->setCallingConv(static_cast(CallingConv));
+
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions in CUDA as convergent (meaning, they
+// may call an intrinsically convergent op, such as __syncthreads(), and so
+// can't have certain optimizations applied around them).  LLVM will remove
+// this attribute where it safely can.
+F->addFnAttr(llvm::Attribute::Convergent);
+  }
 }
 
 /// Determines whether the language options require us to model
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-17 Thread Justin Lebar via cfe-commits
jlebar added a comment.

In http://reviews.llvm.org/D17056#355198, @jlebar wrote:

> Move code into SetLLVMFunctionAttributesForDefinition.


Actually, this doesn't work -- we don't annotate

  __host__ __device__ void baz();

as convergent.  (I ran the tests, but of course I didn't notice it failing...)


http://reviews.llvm.org/D17056



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-17 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 48255.
jlebar added a comment.

Move code into SetLLVMFunctionAttributesForDefinition.


http://reviews.llvm.org/D17056

Files:
  lib/CodeGen/CodeGenModule.cpp
  test/CodeGenCUDA/convergent.cu

Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,35 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() { baz(); }
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CodeGenModule.cpp
===
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -922,6 +922,14 @@
 if (F->getAlignment() < 2 && isa(D))
   F->setAlignment(2);
   }
+
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions in CUDA as convergent (meaning, they
+// may call an intrinsically convergent op, such as __syncthreads(), and so
+// can't have certain optimizations applied around them).  LLVM will remove
+// this attribute where it safely can.
+F->addFnAttr(llvm::Attribute::Convergent);
+  }
 }
 
 void CodeGenModule::SetCommonAttributes(const Decl *D,


Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,35 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() { baz(); }
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CodeGenModule.cpp
===
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -922,6 +922,14 @@
 if (F->getAlignment() < 2 && isa(D))
   F->setAlignment(2);
   }
+
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions in CUDA as convergent (meaning, they
+// may call an intrinsically convergent op, such as __syncthreads(), and so
+// can't have certain optimizations applied around them).  LLVM will remove
+// this attribute where it safely can.
+F->addFnAttr(llvm::Attribute::Convergent);
+  }
 }
 
 void CodeGenModule::SetCommonAttributes(const Decl *D,
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17313: [CUDA] Annotate all calls in CUDA device mode as convergent.

2016-02-16 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 48143.
jlebar added a comment.

Fix typo.


http://reviews.llvm.org/D17313

Files:
  lib/CodeGen/CGCall.cpp
  test/CodeGenCUDA/convergent.cu
  test/CodeGenCUDA/device-var-init.cu

Index: test/CodeGenCUDA/device-var-init.cu
===
--- test/CodeGenCUDA/device-var-init.cu
+++ test/CodeGenCUDA/device-var-init.cu
@@ -382,7 +382,7 @@
 // CHECK:   call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
 // CHECK:   call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
 // CHECK:   call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
-// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t) #3
+// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)
 // CHECK:   call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
 // CHECK:   call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
 // CHECK:   call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)
Index: test/CodeGenCUDA/convergent.cu
===
--- test/CodeGenCUDA/convergent.cu
+++ test/CodeGenCUDA/convergent.cu
@@ -22,12 +22,16 @@
 // DEVICE-SAME: convergent
 // DEVICE-NEXT: define void @_Z3barv
 __host__ __device__ void baz();
-__host__ __device__ void bar() { baz(); }
+__host__ __device__ void bar() {
+  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
+  baz();
+}
 
 // DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
 // DEVICE: attributes [[BAZ_ATTR]] = {
 // DEVICE-SAME: convergent
 // DEVICE-SAME: }
+// DEVICE: attributes [[CALL_ATTR]] = { convergent }
 
 // HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
 // HOST: attributes [[BAZ_ATTR]] = {
Index: lib/CodeGen/CGCall.cpp
===
--- lib/CodeGen/CGCall.cpp
+++ lib/CodeGen/CGCall.cpp
@@ -3139,7 +3139,13 @@
   if (CGM.getLangOpts().ObjCAutoRefCount)
 AddObjCARCExceptionMetadata(Inst);
 
-  return llvm::CallSite(Inst);
+  llvm::CallSite CS(Inst);
+  // All calls in CUDA device mode must conservatively be marked as convergent,
+  // preventing some optimizations.  The optimizer can remove this if it can
+  // prove the the callee is not convergent.
+  if (CGM.getLangOpts().CUDA && CGM.getLangOpts().CUDAIsDevice)
+CS.setConvergent();
+  return CS;
 }
 
 /// \brief Store a non-aggregate value to an address to initialize it.  For
@@ -3539,6 +3545,14 @@
 Attrs.addAttribute(getLLVMContext(), llvm::AttributeSet::FunctionIndex,
llvm::Attribute::NoInline);
 
+  // All calls in CUDA device code are conservatively marked as convergent.  
The
+  // optimizer is able to remove this attribute if it can prove that the callee
+  // is not convergent.
+  if (CGM.getLangOpts().CUDA && CGM.getLangOpts().CUDAIsDevice)
+Attrs =
+Attrs.addAttribute(getLLVMContext(), llvm::AttributeSet::FunctionIndex,
+   llvm::Attribute::Convergent);
+
   CS.setAttributes(Attrs);
   CS.setCallingConv(static_cast(CallingConv));
 


Index: test/CodeGenCUDA/device-var-init.cu
===
--- test/CodeGenCUDA/device-var-init.cu
+++ test/CodeGenCUDA/device-var-init.cu
@@ -382,7 +382,7 @@
 // CHECK:   call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
 // CHECK:   call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
 // CHECK:   call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
-// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t) #3
+// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)
 // CHECK:   call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
 // CHECK:   call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
 // CHECK:   call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)
Index: test/CodeGenCUDA/convergent.cu
===
--- test/CodeGenCUDA/convergent.cu
+++ test/CodeGenCUDA/convergent.cu
@@ -22,12 +22,16 @@
 // DEVICE-SAME: convergent
 // DEVICE-NEXT: define void @_Z3barv
 __host__ __device__ void baz();
-__host__ __device__ void bar() { baz(); }
+__host__ __device__ void bar() {
+  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
+  baz();
+}
 
 // DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
 // DEVICE: attributes [[BAZ_ATTR]] = {
 // DEVICE-SAME: convergent
 // DEVICE-SAME: }
+// DEVICE: attributes [[CALL_ATTR]] = { convergent }
 
 // HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
 // HOST: attributes [[BAZ_ATTR]] = {
Index: lib/CodeGen/CGCall.cpp
===
--- lib/CodeGen/CGCall.cpp
+++ lib/CodeGen/CGCall.cpp
@@ -3139,7 +3139,13 @@
   if (CGM.getLangOpts().ObjCAutoRefCount)
 AddObjCARCExceptionMetadata(Inst);
 
-  return llvm::CallSite(Inst);
+  llvm::CallSite CS(Inst);
+  // All calls in CUDA device mode must conservatively be marked as convergent,
+  // preventing some optimizations.  The optimizer can 

[PATCH] D17313: [CUDA] Annotate all calls in CUDA device mode as convergent.

2016-02-16 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added reviewers: rnk, majnemer.
jlebar added subscribers: tra, cfe-commits.

We need the notion of convergent functions -- which may expose
convergent behavior to callers -- and convergent calls, which are calls
where we would like to preserve convergent behavior in the callee, if
possible.

In CUDA device mode, all calls and functions are convergent.  The
optimizer can then strip this away under some circumstances.

http://reviews.llvm.org/D17313

Files:
  lib/CodeGen/CGCall.cpp
  test/CodeGenCUDA/convergent.cu
  test/CodeGenCUDA/device-var-init.cu

Index: test/CodeGenCUDA/device-var-init.cu
===
--- test/CodeGenCUDA/device-var-init.cu
+++ test/CodeGenCUDA/device-var-init.cu
@@ -382,7 +382,7 @@
 // CHECK:   call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
 // CHECK:   call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
 // CHECK:   call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
-// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t) #3
+// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)
 // CHECK:   call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
 // CHECK:   call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
 // CHECK:   call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)
Index: test/CodeGenCUDA/convergent.cu
===
--- test/CodeGenCUDA/convergent.cu
+++ test/CodeGenCUDA/convergent.cu
@@ -22,12 +22,16 @@
 // DEVICE-SAME: convergent
 // DEVICE-NEXT: define void @_Z3barv
 __host__ __device__ void baz();
-__host__ __device__ void bar() { baz(); }
+__host__ __device__ void bar() {
+  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
+  baz();
+}
 
 // DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
 // DEVICE: attributes [[BAZ_ATTR]] = {
 // DEVICE-SAME: convergent
 // DEVICE-SAME: }
+// DEVICE: attributes [[CALL_ATTR]] = { convergent }
 
 // HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
 // HOST: attributes [[BAZ_ATTR]] = {
Index: lib/CodeGen/CGCall.cpp
===
--- lib/CodeGen/CGCall.cpp
+++ lib/CodeGen/CGCall.cpp
@@ -3139,7 +3139,15 @@
   if (CGM.getLangOpts().ObjCAutoRefCount)
 AddObjCARCExceptionMetadata(Inst);
 
-  return llvm::CallSite(Inst);
+  llvm::CallSite CS(Inst);
+  // All calls in CUDA device mode must conservatively be marked as convergent,
+  // preventing some optimizations.  The optimizer can remove this if it can
+  // prove the the callee is not convergent.
+  if (CGM.getLangOpts().CUDA && CGM.getLangOpts().CUDAIsDevice) {
+CS.addAttribute(llvm::AttributeSet::FunctionIndex,
+llvm::Attribute::Convergent);
+  }
+  return CS;
 }
 
 /// \brief Store a non-aggregate value to an address to initialize it.  For
@@ -3539,6 +3547,14 @@
 Attrs.addAttribute(getLLVMContext(), llvm::AttributeSet::FunctionIndex,
llvm::Attribute::NoInline);
 
+  // All calls in CUDA device code are conservatively marked as convergent.  
The
+  // optimizer is able to remove this attribute if it can prove that the callee
+  // is not convergent.
+  if (CGM.getLangOpts().CUDA && CGM.getLangOpts().CUDAIsDevice)
+Attrs =
+Attrs.addAttribute(getLLVMContext(), llvm::AttributeSet::FunctionIndex,
+   llvm::Attribute::Convergent);
+
   CS.setAttributes(Attrs);
   CS.setCallingConv(static_cast(CallingConv));
 


Index: test/CodeGenCUDA/device-var-init.cu
===
--- test/CodeGenCUDA/device-var-init.cu
+++ test/CodeGenCUDA/device-var-init.cu
@@ -382,7 +382,7 @@
 // CHECK:   call void @_ZN4NETCC1IJEEEDpT_(%struct.NETC* %netc)
 // CHECK:   call void @_ZN7EC_I_ECC1Ev(%struct.EC_I_EC* %ec_i_ec)
 // CHECK:   call void @_ZN8EC_I_EC1C1Ev(%struct.EC_I_EC1* %ec_i_ec1)
-// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t) #3
+// CHECK:   call void @_ZN5T_V_TC1Ev(%struct.T_V_T* %t_v_t)
 // CHECK:   call void @_ZN7T_B_NECC1Ev(%struct.T_B_NEC* %t_b_nec)
 // CHECK:   call void @_ZN7T_F_NECC1Ev(%struct.T_F_NEC* %t_f_nec)
 // CHECK:   call void @_ZN8T_FA_NECC1Ev(%struct.T_FA_NEC* %t_fa_nec)
Index: test/CodeGenCUDA/convergent.cu
===
--- test/CodeGenCUDA/convergent.cu
+++ test/CodeGenCUDA/convergent.cu
@@ -22,12 +22,16 @@
 // DEVICE-SAME: convergent
 // DEVICE-NEXT: define void @_Z3barv
 __host__ __device__ void baz();
-__host__ __device__ void bar() { baz(); }
+__host__ __device__ void bar() {
+  // DEVICE: call void @_Z3bazv() [[CALL_ATTR:#[0-9]+]]
+  baz();
+}
 
 // DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
 // DEVICE: attributes [[BAZ_ATTR]] = {
 // DEVICE-SAME: convergent
 // DEVICE-SAME: }
+// DEVICE: attributes [[CALL_ATTR]] = { convergent }
 
 // HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
 // HOST: attributes [[BAZ_ATTR]] 

[PATCH] D17216: Make test/Driver/output-file-cleanup.c hermetic.

2016-02-12 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: rafael.
jlebar added a subscriber: cfe-commits.

It checks that certain files do and exist, so make sure that they don't
exist at the beginning of the test.

This hid a failure in r260448; to see the failure, you had to run the test with
a clean-ish objdir.

http://reviews.llvm.org/D17216

Files:
  test/Driver/output-file-cleanup.c

Index: test/Driver/output-file-cleanup.c
===
--- test/Driver/output-file-cleanup.c
+++ test/Driver/output-file-cleanup.c
@@ -1,3 +1,5 @@
+// RUN: rm -f "%t.d" "%t1.s" "%t2.s" "%t3.s" "%t4.s" "%t5.s"
+//
 // RUN: touch %t.s
 // RUN: not %clang -S -DCRASH -o %t.s -MMD -MF %t.d %s
 // RUN: test ! -f %t.s


Index: test/Driver/output-file-cleanup.c
===
--- test/Driver/output-file-cleanup.c
+++ test/Driver/output-file-cleanup.c
@@ -1,3 +1,5 @@
+// RUN: rm -f "%t.d" "%t1.s" "%t2.s" "%t3.s" "%t4.s" "%t5.s"
+//
 // RUN: touch %t.s
 // RUN: not %clang -S -DCRASH -o %t.s -MMD -MF %t.d %s
 // RUN: test ! -f %t.s
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D17217: Bail on compilation as soon as a job fails.

2016-02-12 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: rafael.
jlebar added subscribers: tra, echristo, jhen, cfe-commits.

(Re-land of r260448, which was reverted in r260522 due to a test failure
in Driver/output-file-cleanup.c that only showed up in fresh builds.)

Previously we attempted to be smart; if one job failed, we'd run all
jobs that didn't depend on the failing job.

Problem is, this doesn't work well for e.g. CUDA compilation without
-save-temps.  In this case, the device-side and host-side Assemble
actions (which actually are responsible for preprocess, compile,
backend, and assemble, since we're not saving temps) are necessarily
distinct.  So our clever heuristic doesn't help us, and we repeat every
error message once for host and once for each device arch.

The main effect of this change, other than fixing CUDA, is that if you
pass multiple cc files to one instance of clang and you get a compile
error, we'll stop when the first cc1 job fails.

http://reviews.llvm.org/D17217

Files:
  lib/Driver/Compilation.cpp
  test/Driver/output-file-cleanup.c

Index: test/Driver/output-file-cleanup.c
===
--- test/Driver/output-file-cleanup.c
+++ test/Driver/output-file-cleanup.c
@@ -38,14 +38,17 @@
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
 
+// When given multiple .c files to compile, clang compiles them in order until
+// it hits an error, at which point it stops.
+//
 // RUN: touch %t1.c
 // RUN: echo "invalid C code" > %t2.c
 // RUN: touch %t3.c
 // RUN: echo "invalid C code" > %t4.c
 // RUN: touch %t5.c
 // RUN: cd %T && not %clang -S %t1.c %t2.c %t3.c %t4.c %t5.c
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
-// RUN: test -f %t3.s
+// RUN: test ! -f %t3.s
 // RUN: test ! -f %t4.s
-// RUN: test -f %t5.s
+// RUN: test ! -f %t5.s
Index: lib/Driver/Compilation.cpp
===
--- lib/Driver/Compilation.cpp
+++ lib/Driver/Compilation.cpp
@@ -163,39 +163,17 @@
   return ExecutionFailed ? 1 : Res;
 }
 
-typedef SmallVectorImpl< std::pair > FailingCommandList;
-
-static bool ActionFailed(const Action *A,
- const FailingCommandList ) {
-
-  if (FailingCommands.empty())
-return false;
-
-  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
- CE = FailingCommands.end(); CI != CE; ++CI)
-if (A == &(CI->second->getSource()))
-  return true;
-
-  for (Action::const_iterator AI = A->begin(), AE = A->end(); AI != AE; ++AI)
-if (ActionFailed(*AI, FailingCommands))
-  return true;
-
-  return false;
-}
-
-static bool InputsOk(const Command ,
- const FailingCommandList ) {
-  return !ActionFailed((), FailingCommands);
-}
-
-void Compilation::ExecuteJobs(const JobList ,
-  FailingCommandList ) const {
+void Compilation::ExecuteJobs(
+const JobList ,
+SmallVectorImpl> ) const {
   for (const auto  : Jobs) {
-if (!InputsOk(Job, FailingCommands))
-  continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  // Bail as soon as one command fails, so we don't output duplicate error
+  // messages if we die on e.g. the same file.
+  return;
+}
   }
 }
 


Index: test/Driver/output-file-cleanup.c
===
--- test/Driver/output-file-cleanup.c
+++ test/Driver/output-file-cleanup.c
@@ -38,14 +38,17 @@
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
 
+// When given multiple .c files to compile, clang compiles them in order until
+// it hits an error, at which point it stops.
+//
 // RUN: touch %t1.c
 // RUN: echo "invalid C code" > %t2.c
 // RUN: touch %t3.c
 // RUN: echo "invalid C code" > %t4.c
 // RUN: touch %t5.c
 // RUN: cd %T && not %clang -S %t1.c %t2.c %t3.c %t4.c %t5.c
 // RUN: test -f %t1.s
 // RUN: test ! -f %t2.s
-// RUN: test -f %t3.s
+// RUN: test ! -f %t3.s
 // RUN: test ! -f %t4.s
-// RUN: test -f %t5.s
+// RUN: test ! -f %t5.s
Index: lib/Driver/Compilation.cpp
===
--- lib/Driver/Compilation.cpp
+++ lib/Driver/Compilation.cpp
@@ -163,39 +163,17 @@
   return ExecutionFailed ? 1 : Res;
 }
 
-typedef SmallVectorImpl< std::pair > FailingCommandList;
-
-static bool ActionFailed(const Action *A,
- const FailingCommandList ) {
-
-  if (FailingCommands.empty())
-return false;
-
-  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
- CE = FailingCommands.end(); CI != CE; ++CI)
-if (A == &(CI->second->getSource()))
-  return true;
-
-  for (Action::const_iterator AI = A->begin(), AE = A->end(); AI != AE; 

Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-02-12 Thread Justin Lebar via cfe-commits
jlebar added a comment.

espindola reverted this in r260522 because of test failures in 
Driver/output-file-cleanup.c.

The reason I didn't catch this locally is that the test is non-hermetic -- if 
it passed once in an objdir, this patch does not make it fail again.  You have 
to nuke (part of) the objdir before it will fail.

I'll send a patch to make the test hermetic.  Unsure yet whether it's a bug in 
this patch or the test that the test fails at all.


Repository:
  rL LLVM

http://reviews.llvm.org/D16514



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16870: [CUDA] Tweak attribute-based overload resolution to match nvcc behavior.

2016-02-11 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Mostly comments on comments.



Comment at: lib/Sema/SemaCUDA.cpp:71
@@ -70,3 +70,3 @@
 // H  - handled in (x)
-// Preferences: b-best, f-fallback, l-last resort, n-never.
+// Preferences: +:native, *:host-device, o:same side, .:wrong side, -:never.
 //

What if we used the following mapping?

N = native
HD = host+device
SS = same-side
WS = wrong-side
`-` = never

This mimics how we were writing on the whiteboard.


Comment at: lib/Sema/SemaCUDA.cpp:115
@@ -114,2 +114,3 @@
 
-  // (b) Best case scenarios
+  // (b) Calling HostDevice is OK as a fallback that works for everyone.
+  if (CalleeTarget == CFT_HostDevice)

I'm not sure "fallback" is the right word to use here anymore, as HD --> HD 
gets priority HD.


Comment at: lib/Sema/SemaCUDA.cpp:127
@@ -132,9 +126,3 @@
   if (CallerTarget == CFT_HostDevice) {
-// Calling a function that matches compilation mode is OK.
-// Calling a function from the other side is frowned upon.
-if (getLangOpts().CUDAIsDevice)
-  return CalleeTarget == CFT_Device ? CFP_Fallback : QuestionableResult;
-else
-  return (CalleeTarget == CFT_Host || CalleeTarget == CFT_Global)
- ? CFP_Fallback
- : QuestionableResult;
+// It's OK to call a mode-matching function from an HD one.
+if ((getLangOpts().CUDAIsDevice && CalleeTarget == CFT_Device) ||

Suggest "compilation-mode matching"


Comment at: lib/Sema/SemaOverload.cpp:8731
@@ +8730,3 @@
+  // but accepted by both clang and NVCC. However during particular
+  // compilation pass only one call variant is viable. We need to
+  // exclude non-viable overload candidates from consideration based

We should be consistent wrt whether we call it a compilation pass, or (as 
above) compilation mode, or whatever.  (I think "mode" may be right.)

Also please add an article: "During a particular compilation mode".


Comment at: lib/Sema/SemaOverload.cpp:8738
@@ +8737,3 @@
+const FunctionDecl *Caller = dyn_cast(S.CurContext);
+bool IgnoreWrongSideFunctions =
+llvm::any_of(Candidates, [&](OverloadCandidate *Cand) {

Maybe name this something more closely tied to what it is -- e.g. 
ContainsSameSideCall -- and then add a comment in the if statement saying 
"Remove wrong-side calls from consideration."


Comment at: lib/Sema/SemaOverload.cpp:8752
@@ +8751,3 @@
+  }),
+   Candidates.end());
+  }

This is an indentation mouthful -- can we pull the lambda out, maybe?


Comment at: lib/Sema/SemaOverload.cpp:8757
@@ -8726,3 +8756,3 @@
   Best = end();
-  for (iterator Cand = begin(); Cand != end(); ++Cand) {
+  for (auto Cand : Candidates)
 if (Cand->Viable)

Suggest auto*, so it's clear we're not copying things.


Comment at: lib/Sema/SemaOverload.cpp:8771
@@ -8741,3 +8770,3 @@
   // function. If not, we have an ambiguity.
-  for (iterator Cand = begin(); Cand != end(); ++Cand) {
+  for (auto Cand : Candidates) {
 if (Cand->Viable &&

auto*


Comment at: test/CodeGenCUDA/function-overload.cu:81
@@ -79,1 +80,3 @@
 
+// NOTE: this is an artefact of split-mode CUDA compilation that we
+// need to mimic. HD functions are sometimes allowed to call H or D

Nit: Suggest American spelling, "artifact", which is much more common in llvm 
codebase.

Maybe also remove this sentence, or move it down somewhere later -- this feels 
like a bad "topic sentence" for the paragraph.  e.g.

HD functions are sometimes allowed to call H or D functions -- this is an 
artifact of the source-to-source splitting performed by nvcc that we need to 
mimic.


Comment at: test/CodeGenCUDA/function-overload.cu:86
@@ +85,3 @@
+// considered at all. For clang both H and D variants will become
+// function overloads. Normally target attribute is considered only if
+// C++ rules can not determine which function is better. However in

This is setting up a contrast between nvcc and clang, so suggest connecting the 
phrases with "but" or "in contrast".  Also suggest being specific that we're 
talking about nvcc -- since split-mode compilation isn't a thing in clang, if 
we just talk about it generally, it's not clear what we're referring to.  e.g.

During device mode compilation in nvcc, host functions aren't present at all, 
so don't participate in overloading.  But in clang, H and D functions are 
present in both compilation modes.  Clang normally uses the target attribute as 
a tiebreaker between overloads with otherwise identical priority, but in order 
to match nvcc's behavior, we sometimes need to wholly discard overloads that 
would not be present during compilation under nvcc.


Comment 

Re: [PATCH] D17111: [CUDA] Added --cuda-noopt-device-debug option to control ptxas' debug info generation.

2016-02-11 Thread Justin Lebar via cfe-commits
jlebar accepted this revision.


Comment at: lib/Driver/Tools.cpp:10696
@@ +10695,3 @@
+// ptxas does not accept -g option if optimization is enabled, so we ignore
+// compiler's -O* options if we want debug info.
+CmdArgs.push_back("-g");

Nit: ignore the compiler's


http://reviews.llvm.org/D17111



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r260448 - Bail on compilation as soon as a job fails.

2016-02-10 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 10 16:17:00 2016
New Revision: 260448

URL: http://llvm.org/viewvc/llvm-project?rev=260448=rev
Log:
Bail on compilation as soon as a job fails.

Previously we attempted to be smart; if one job failed, we'd run all
jobs that didn't depend on the failing job.

Problem is, this doesn't work well for e.g. CUDA compilation without
-save-temps.  In this case, the device-side and host-side Assemble
actions (which actually are responsible for preprocess, compile,
backend, and assemble, since we're not saving temps) are necessarily
distinct.  So our clever heuristic doesn't help us, and we repeat every
error message once for host and once for each device arch.

The main effect of this change, other than fixing CUDA, is that if you
pass multiple cc files to one instance of clang and you get a compile
error, we'll stop when the first cc1 job fails.

Reviewers: tra, echristo

Subscribers: jhen, cfe-commits

Differential Revision: http://reviews.llvm.org/D16514

Modified:
cfe/trunk/lib/Driver/Compilation.cpp

Modified: cfe/trunk/lib/Driver/Compilation.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Compilation.cpp?rev=260448=260447=260448=diff
==
--- cfe/trunk/lib/Driver/Compilation.cpp (original)
+++ cfe/trunk/lib/Driver/Compilation.cpp Wed Feb 10 16:17:00 2016
@@ -163,39 +163,17 @@ int Compilation::ExecuteCommand(const Co
   return ExecutionFailed ? 1 : Res;
 }
 
-typedef SmallVectorImpl< std::pair > FailingCommandList;
-
-static bool ActionFailed(const Action *A,
- const FailingCommandList ) {
-
-  if (FailingCommands.empty())
-return false;
-
-  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
- CE = FailingCommands.end(); CI != CE; ++CI)
-if (A == &(CI->second->getSource()))
-  return true;
-
-  for (Action::const_iterator AI = A->begin(), AE = A->end(); AI != AE; ++AI)
-if (ActionFailed(*AI, FailingCommands))
-  return true;
-
-  return false;
-}
-
-static bool InputsOk(const Command ,
- const FailingCommandList ) {
-  return !ActionFailed((), FailingCommands);
-}
-
-void Compilation::ExecuteJobs(const JobList ,
-  FailingCommandList ) const {
+void Compilation::ExecuteJobs(
+const JobList ,
+SmallVectorImpl> ) const {
   for (const auto  : Jobs) {
-if (!InputsOk(Job, FailingCommands))
-  continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  // Bail as soon as one command fails, so we don't output duplicate error
+  // messages if we die on e.g. the same file.
+  return;
+}
   }
 }
 


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-02-10 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL260448: Bail on compilation as soon as a job fails. 
(authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D16514?vs=47520=47525#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D16514

Files:
  cfe/trunk/lib/Driver/Compilation.cpp

Index: cfe/trunk/lib/Driver/Compilation.cpp
===
--- cfe/trunk/lib/Driver/Compilation.cpp
+++ cfe/trunk/lib/Driver/Compilation.cpp
@@ -163,39 +163,17 @@
   return ExecutionFailed ? 1 : Res;
 }
 
-typedef SmallVectorImpl< std::pair > FailingCommandList;
-
-static bool ActionFailed(const Action *A,
- const FailingCommandList ) {
-
-  if (FailingCommands.empty())
-return false;
-
-  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
- CE = FailingCommands.end(); CI != CE; ++CI)
-if (A == &(CI->second->getSource()))
-  return true;
-
-  for (Action::const_iterator AI = A->begin(), AE = A->end(); AI != AE; ++AI)
-if (ActionFailed(*AI, FailingCommands))
-  return true;
-
-  return false;
-}
-
-static bool InputsOk(const Command ,
- const FailingCommandList ) {
-  return !ActionFailed((), FailingCommands);
-}
-
-void Compilation::ExecuteJobs(const JobList ,
-  FailingCommandList ) const {
+void Compilation::ExecuteJobs(
+const JobList ,
+SmallVectorImpl> ) const {
   for (const auto  : Jobs) {
-if (!InputsOk(Job, FailingCommands))
-  continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  // Bail as soon as one command fails, so we don't output duplicate error
+  // messages if we die on e.g. the same file.
+  return;
+}
   }
 }
 


Index: cfe/trunk/lib/Driver/Compilation.cpp
===
--- cfe/trunk/lib/Driver/Compilation.cpp
+++ cfe/trunk/lib/Driver/Compilation.cpp
@@ -163,39 +163,17 @@
   return ExecutionFailed ? 1 : Res;
 }
 
-typedef SmallVectorImpl< std::pair > FailingCommandList;
-
-static bool ActionFailed(const Action *A,
- const FailingCommandList ) {
-
-  if (FailingCommands.empty())
-return false;
-
-  for (FailingCommandList::const_iterator CI = FailingCommands.begin(),
- CE = FailingCommands.end(); CI != CE; ++CI)
-if (A == &(CI->second->getSource()))
-  return true;
-
-  for (Action::const_iterator AI = A->begin(), AE = A->end(); AI != AE; ++AI)
-if (ActionFailed(*AI, FailingCommands))
-  return true;
-
-  return false;
-}
-
-static bool InputsOk(const Command ,
- const FailingCommandList ) {
-  return !ActionFailed((), FailingCommands);
-}
-
-void Compilation::ExecuteJobs(const JobList ,
-  FailingCommandList ) const {
+void Compilation::ExecuteJobs(
+const JobList ,
+SmallVectorImpl> ) const {
   for (const auto  : Jobs) {
-if (!InputsOk(Job, FailingCommands))
-  continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  // Bail as soon as one command fails, so we don't output duplicate error
+  // messages if we die on e.g. the same file.
+  return;
+}
   }
 }
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D17103: [CUDA] Don't crash when trying to printf a non-scalar object.

2016-02-10 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added reviewers: majnemer, rnk.
jlebar added subscribers: tra, jhen, cfe-commits.

We can't do the right thing, since there's no right thing to do, but at
least we can not crash the compiler.

http://reviews.llvm.org/D17103

Files:
  lib/CodeGen/CGCUDABuiltin.cpp
  test/CodeGenCUDA/printf.cu

Index: test/CodeGenCUDA/printf.cu
===
--- test/CodeGenCUDA/printf.cu
+++ test/CodeGenCUDA/printf.cu
@@ -41,3 +41,12 @@
 printf("%d", 42);
   }
 }
+
+// Check that we don't crash when asked to printf a non-scalar arg.
+struct Struct {
+  int x;
+  int y;
+};
+__device__ void PrintfNonScalar() {
+  printf("%d", Struct());
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -83,6 +83,11 @@
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
+  // We don't know how to emit non-scalar varargs, so just remove them.
+  Args.erase(std::remove_if(Args.begin() + 1, Args.end(),
+[](const CallArg ) { return !A.RV.isScalar(); }),
+ Args.end());
+
   // Construct and fill the args buffer that we'll pass to vprintf.
   llvm::Value *BufferPtr;
   if (Args.size() <= 1) {


Index: test/CodeGenCUDA/printf.cu
===
--- test/CodeGenCUDA/printf.cu
+++ test/CodeGenCUDA/printf.cu
@@ -41,3 +41,12 @@
 printf("%d", 42);
   }
 }
+
+// Check that we don't crash when asked to printf a non-scalar arg.
+struct Struct {
+  int x;
+  int y;
+};
+__device__ void PrintfNonScalar() {
+  printf("%d", Struct());
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -83,6 +83,11 @@
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
+  // We don't know how to emit non-scalar varargs, so just remove them.
+  Args.erase(std::remove_if(Args.begin() + 1, Args.end(),
+[](const CallArg ) { return !A.RV.isScalar(); }),
+ Args.end());
+
   // Construct and fill the args buffer that we'll pass to vprintf.
   llvm::Value *BufferPtr;
   if (Args.size() <= 1) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D17100: Remove unused ToolChain arg from Driver::ConstructPhaseAction and BuildAction.

2016-02-10 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: echristo.
jlebar added a subscriber: cfe-commits.

Actions don't depend on the toolchain; they get bound to a particular
toolchain via BindArch.

No functional changes.

http://reviews.llvm.org/D17100

Files:
  include/clang/Driver/Driver.h
  lib/Driver/Driver.cpp

Index: lib/Driver/Driver.cpp
===
--- lib/Driver/Driver.cpp
+++ lib/Driver/Driver.cpp
@@ -510,8 +510,7 @@
   if (TC.getTriple().isOSBinFormatMachO())
 BuildUniversalActions(*C, C->getDefaultToolChain(), Inputs);
   else
-BuildActions(*C, C->getDefaultToolChain(), C->getArgs(), Inputs,
- C->getActions());
+BuildActions(*C, C->getArgs(), Inputs, C->getActions());
 
   if (CCCPrintPhases) {
 PrintActions(*C);
@@ -625,7 +624,7 @@
   if (TC.getTriple().isOSBinFormatMachO())
 BuildUniversalActions(C, TC, Inputs);
   else
-BuildActions(C, TC, C.getArgs(), Inputs, C.getActions());
+BuildActions(C, C.getArgs(), Inputs, C.getActions());
 
   BuildJobs(C);
 
@@ -1036,7 +1035,7 @@
 Archs.push_back(Args.MakeArgString(TC.getDefaultUniversalArchName()));
 
   ActionList SingleActions;
-  BuildActions(C, TC, Args, BAInputs, SingleActions);
+  BuildActions(C, Args, BAInputs, SingleActions);
 
   // Add in arch bindings for every top level action, as well as lipo and
   // dsymutil steps if needed.
@@ -1322,8 +1321,7 @@
   assert(C.getCudaDeviceToolChain() &&
  "Missing toolchain for device-side compilation.");
   ActionList CudaDeviceActions;
-  C.getDriver().BuildActions(C, *C.getCudaDeviceToolChain(), Args,
- CudaDeviceInputs, CudaDeviceActions);
+  C.getDriver().BuildActions(C, Args, CudaDeviceInputs, CudaDeviceActions);
   assert(GpuArchList.size() == CudaDeviceActions.size() &&
  "Failed to create actions for all devices");
 
@@ -1387,9 +1385,8 @@
   ActionList({FatbinAction}));
 }
 
-void Driver::BuildActions(Compilation , const ToolChain ,
-  DerivedArgList , const InputList ,
-  ActionList ) const {
+void Driver::BuildActions(Compilation , DerivedArgList ,
+  const InputList , ActionList ) const {
   llvm::PrettyStackTraceString CrashInfo("Building compilation actions");
 
   if (!SuppressMissingInputWarning && Inputs.empty()) {
@@ -1516,7 +1513,7 @@
 continue;
 
   // Otherwise construct the appropriate action.
-  Current = ConstructPhaseAction(C, TC, Args, Phase, Current);
+  Current = ConstructPhaseAction(C, Args, Phase, Current);
 
   if (InputType == types::TY_CUDA && Phase == CudaInjectionPhase) {
 Current = buildCudaActions(C, Args, InputArg, Current, Actions);
@@ -1553,9 +1550,8 @@
   Args.ClaimAllArgs(options::OPT_cuda_host_only);
 }
 
-Action *Driver::ConstructPhaseAction(Compilation , const ToolChain ,
- const ArgList , phases::ID Phase,
- Action *Input) const {
+Action *Driver::ConstructPhaseAction(Compilation , const ArgList ,
+ phases::ID Phase, Action *Input) const {
   llvm::PrettyStackTraceString CrashInfo("Constructing phase actions");
   // Build the appropriate action.
   switch (Phase) {
Index: include/clang/Driver/Driver.h
===
--- include/clang/Driver/Driver.h
+++ include/clang/Driver/Driver.h
@@ -299,12 +299,10 @@
   /// given arguments, which are only done for a single architecture.
   ///
   /// \param C - The compilation that is being built.
-  /// \param TC - The default host tool chain.
   /// \param Args - The input arguments.
   /// \param Actions - The list to store the resulting actions onto.
-  void BuildActions(Compilation , const ToolChain ,
-llvm::opt::DerivedArgList , const InputList ,
-ActionList ) const;
+  void BuildActions(Compilation , llvm::opt::DerivedArgList ,
+const InputList , ActionList ) const;
 
   /// BuildUniversalActions - Construct the list of actions to perform
   /// for the given arguments, which may require a universal build.
@@ -376,9 +374,8 @@
   /// ConstructAction - Construct the appropriate action to do for
   /// \p Phase on the \p Input, taking in to account arguments
   /// like -fsyntax-only or --analyze.
-  Action *ConstructPhaseAction(Compilation , const ToolChain ,
-   const llvm::opt::ArgList , phases::ID Phase,
-   Action *Input) const;
+  Action *ConstructPhaseAction(Compilation , const llvm::opt::ArgList ,
+   phases::ID Phase, Action *Input) const;
 
   /// BuildJobsForAction - Construct the jobs to perform for the action \p A and
   /// return an InputInfo for the result of running \p A.  Will only construct

Re: [PATCH] D17111: [CUDA] pass debug options to ptxas.

2016-02-10 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/Driver/Tools.cpp:10707
@@ +10706,3 @@
+// ptxas does not accept -g option if optimization is enabled, so we ignore
+// compiler's -O* options if we want debug info.
+CmdArgs.push_back("-g");

I think this is would be very surprising to users.  -g does not usually have a 
large performance impact, so -O2 -g does not generally mean "generate slow 
code," as far as I know.  I'm concerned that this will result in people 
accidentally compiling with ptxas -O0 (which is why I didn't do it like this to 
begin with).

Can we accomplish this in a more explicit way?


http://reviews.llvm.org/D17111



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17103: [CUDA] Don't crash when trying to printf a non-scalar object.

2016-02-10 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Yeah, I have no idea what's the right thing to do here.  We can always pass a 
null pointer, that's easy.  David, Reid, do you know what is the correct 
behavior?


http://reviews.llvm.org/D17103



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17103: [CUDA] Don't crash when trying to printf a non-scalar object.

2016-02-10 Thread Justin Lebar via cfe-commits
jlebar added a comment.

In http://reviews.llvm.org/D17103#349245, @hfinkel wrote:

> In http://reviews.llvm.org/D17103#349182, @jlebar wrote:
>
> > Yeah, I have no idea what's the right thing to do here.  We can always pass 
> > a null pointer, that's easy.  David, Reid, do you know what is the correct 
> > behavior?
>
>
> I think we need to diagnose / reject this during semantic analysis (and then 
> put a reasonable assert in the backend).


Two things.

a) That doesn't seem to be what we do in regular C++.  It will happily let you 
pass a Struct in with only a warning.
b) At the moment, we don't have the capability to do a proper semantic analysis 
of this.  The issue is, when doing sema checking of __host__ __device__ 
functions, we don't know whether the function will end up being codegen'ed for 
device.  And the semantics of cuda are that it's OK to do things that are 
illegal in device mode from __host__ __device__ functions, so long as you never 
codegen those functions for the device.

We have a plan to address (b) (basically, when doing sema checking, buffer any 
errors we would emit if we were to codegen for device; then we can emit all 
those errors right before codegen), but it's a much bigger thing.  Until then, 
we need to do *something* other than crash here, even if we add additional sema 
checking for plain __device__ fns.


http://reviews.llvm.org/D17103



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17103: [CUDA] Don't crash when trying to printf a non-scalar object.

2016-02-10 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 47569.
jlebar added a comment.

Error out with CGM.ErrorUnsupported when we receive a non-scalar arg.


http://reviews.llvm.org/D17103

Files:
  lib/CodeGen/CGCUDABuiltin.cpp
  test/CodeGenCUDA/printf-aggregate.cu

Index: test/CodeGenCUDA/printf-aggregate.cu
===
--- /dev/null
+++ test/CodeGenCUDA/printf-aggregate.cu
@@ -0,0 +1,17 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: not %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -emit-llvm 
\
+// RUN:   -o - %s 2>&1 | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// Check that we don't crash when asked to printf a non-scalar arg.
+struct Struct {
+  int x;
+  int y;
+};
+__device__ void PrintfNonScalar() {
+  // CHECK: cannot compile this non-scalar arg to printf
+  printf("%d", Struct());
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -83,6 +83,13 @@
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
+  // We don't know how to emit non-scalar varargs.
+  if (std::any_of(Args.begin() + 1, Args.end(),
+  [](const CallArg ) { return !A.RV.isScalar(); })) {
+CGM.ErrorUnsupported(E, "non-scalar arg to printf");
+return RValue::getIgnored();
+  }
+
   // Construct and fill the args buffer that we'll pass to vprintf.
   llvm::Value *BufferPtr;
   if (Args.size() <= 1) {


Index: test/CodeGenCUDA/printf-aggregate.cu
===
--- /dev/null
+++ test/CodeGenCUDA/printf-aggregate.cu
@@ -0,0 +1,17 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: not %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -emit-llvm \
+// RUN:   -o - %s 2>&1 | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// Check that we don't crash when asked to printf a non-scalar arg.
+struct Struct {
+  int x;
+  int y;
+};
+__device__ void PrintfNonScalar() {
+  // CHECK: cannot compile this non-scalar arg to printf
+  printf("%d", Struct());
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -83,6 +83,13 @@
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
+  // We don't know how to emit non-scalar varargs.
+  if (std::any_of(Args.begin() + 1, Args.end(),
+  [](const CallArg ) { return !A.RV.isScalar(); })) {
+CGM.ErrorUnsupported(E, "non-scalar arg to printf");
+return RValue::getIgnored();
+  }
+
   // Construct and fill the args buffer that we'll pass to vprintf.
   llvm::Value *BufferPtr;
   if (Args.size() <= 1) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17103: [CUDA] Don't crash when trying to printf a non-scalar object.

2016-02-10 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
jlebar marked an inline comment as done.
Closed by commit rL260479: [CUDA] Don't crash when trying to printf a 
non-scalar object. (authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D17103?vs=47569=47572#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D17103

Files:
  cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
  cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu

Index: cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
===
--- cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
+++ cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
@@ -83,6 +83,13 @@
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
+  // We don't know how to emit non-scalar varargs.
+  if (std::any_of(Args.begin() + 1, Args.end(),
+  [](const CallArg ) { return !A.RV.isScalar(); })) {
+CGM.ErrorUnsupported(E, "non-scalar arg to printf");
+return RValue::get(llvm::ConstantInt::get(IntTy, 0));
+  }
+
   // Construct and fill the args buffer that we'll pass to vprintf.
   llvm::Value *BufferPtr;
   if (Args.size() <= 1) {
Index: cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu
===
--- cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu
+++ cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu
@@ -0,0 +1,17 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: not %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -emit-llvm 
\
+// RUN:   -o - %s 2>&1 | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// Check that we don't crash when asked to printf a non-scalar arg.
+struct Struct {
+  int x;
+  int y;
+};
+__device__ void PrintfNonScalar() {
+  // CHECK: cannot compile this non-scalar arg to printf
+  printf("%d", Struct());
+}


Index: cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
===
--- cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
+++ cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
@@ -83,6 +83,13 @@
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
+  // We don't know how to emit non-scalar varargs.
+  if (std::any_of(Args.begin() + 1, Args.end(),
+  [](const CallArg ) { return !A.RV.isScalar(); })) {
+CGM.ErrorUnsupported(E, "non-scalar arg to printf");
+return RValue::get(llvm::ConstantInt::get(IntTy, 0));
+  }
+
   // Construct and fill the args buffer that we'll pass to vprintf.
   llvm::Value *BufferPtr;
   if (Args.size() <= 1) {
Index: cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu
===
--- cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu
+++ cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu
@@ -0,0 +1,17 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: not %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -emit-llvm \
+// RUN:   -o - %s 2>&1 | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// Check that we don't crash when asked to printf a non-scalar arg.
+struct Struct {
+  int x;
+  int y;
+};
+__device__ void PrintfNonScalar() {
+  // CHECK: cannot compile this non-scalar arg to printf
+  printf("%d", Struct());
+}
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r260478 - Remove unused ToolChain arg from Driver::ConstructPhaseAction and BuildAction.

2016-02-10 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 10 20:00:50 2016
New Revision: 260478

URL: http://llvm.org/viewvc/llvm-project?rev=260478=rev
Log:
Remove unused ToolChain arg from Driver::ConstructPhaseAction and BuildAction.

Summary:
Actions don't depend on the toolchain; they get bound to a particular
toolchain via BindArch.

No functional changes.

Reviewers: echristo

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D17100

Modified:
cfe/trunk/include/clang/Driver/Driver.h
cfe/trunk/lib/Driver/Driver.cpp

Modified: cfe/trunk/include/clang/Driver/Driver.h
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/Driver.h?rev=260478=260477=260478=diff
==
--- cfe/trunk/include/clang/Driver/Driver.h (original)
+++ cfe/trunk/include/clang/Driver/Driver.h Wed Feb 10 20:00:50 2016
@@ -299,12 +299,10 @@ public:
   /// given arguments, which are only done for a single architecture.
   ///
   /// \param C - The compilation that is being built.
-  /// \param TC - The default host tool chain.
   /// \param Args - The input arguments.
   /// \param Actions - The list to store the resulting actions onto.
-  void BuildActions(Compilation , const ToolChain ,
-llvm::opt::DerivedArgList , const InputList ,
-ActionList ) const;
+  void BuildActions(Compilation , llvm::opt::DerivedArgList ,
+const InputList , ActionList ) const;
 
   /// BuildUniversalActions - Construct the list of actions to perform
   /// for the given arguments, which may require a universal build.
@@ -376,9 +374,8 @@ public:
   /// ConstructAction - Construct the appropriate action to do for
   /// \p Phase on the \p Input, taking in to account arguments
   /// like -fsyntax-only or --analyze.
-  Action *ConstructPhaseAction(Compilation , const ToolChain ,
-   const llvm::opt::ArgList , phases::ID 
Phase,
-   Action *Input) const;
+  Action *ConstructPhaseAction(Compilation , const llvm::opt::ArgList ,
+   phases::ID Phase, Action *Input) const;
 
   /// BuildJobsForAction - Construct the jobs to perform for the action \p A 
and
   /// return an InputInfo for the result of running \p A.  Will only construct

Modified: cfe/trunk/lib/Driver/Driver.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/Driver.cpp?rev=260478=260477=260478=diff
==
--- cfe/trunk/lib/Driver/Driver.cpp (original)
+++ cfe/trunk/lib/Driver/Driver.cpp Wed Feb 10 20:00:50 2016
@@ -510,8 +510,7 @@ Compilation *Driver::BuildCompilation(Ar
   if (TC.getTriple().isOSBinFormatMachO())
 BuildUniversalActions(*C, C->getDefaultToolChain(), Inputs);
   else
-BuildActions(*C, C->getDefaultToolChain(), C->getArgs(), Inputs,
- C->getActions());
+BuildActions(*C, C->getArgs(), Inputs, C->getActions());
 
   if (CCCPrintPhases) {
 PrintActions(*C);
@@ -625,7 +624,7 @@ void Driver::generateCompilationDiagnost
   if (TC.getTriple().isOSBinFormatMachO())
 BuildUniversalActions(C, TC, Inputs);
   else
-BuildActions(C, TC, C.getArgs(), Inputs, C.getActions());
+BuildActions(C, C.getArgs(), Inputs, C.getActions());
 
   BuildJobs(C);
 
@@ -1036,7 +1035,7 @@ void Driver::BuildUniversalActions(Compi
 Archs.push_back(Args.MakeArgString(TC.getDefaultUniversalArchName()));
 
   ActionList SingleActions;
-  BuildActions(C, TC, Args, BAInputs, SingleActions);
+  BuildActions(C, Args, BAInputs, SingleActions);
 
   // Add in arch bindings for every top level action, as well as lipo and
   // dsymutil steps if needed.
@@ -1322,8 +1321,7 @@ static Action *buildCudaActions(Compilat
   assert(C.getCudaDeviceToolChain() &&
  "Missing toolchain for device-side compilation.");
   ActionList CudaDeviceActions;
-  C.getDriver().BuildActions(C, *C.getCudaDeviceToolChain(), Args,
- CudaDeviceInputs, CudaDeviceActions);
+  C.getDriver().BuildActions(C, Args, CudaDeviceInputs, CudaDeviceActions);
   assert(GpuArchList.size() == CudaDeviceActions.size() &&
  "Failed to create actions for all devices");
 
@@ -1387,9 +1385,8 @@ static Action *buildCudaActions(Compilat
   ActionList({FatbinAction}));
 }
 
-void Driver::BuildActions(Compilation , const ToolChain ,
-  DerivedArgList , const InputList ,
-  ActionList ) const {
+void Driver::BuildActions(Compilation , DerivedArgList ,
+  const InputList , ActionList ) const {
   llvm::PrettyStackTraceString CrashInfo("Building compilation actions");
 
   if (!SuppressMissingInputWarning && Inputs.empty()) {
@@ -1516,7 +1513,7 @@ void Driver::BuildActions(Compilation 
 continue;
 
   // Otherwise construct the appropriate action.
-  

Re: [PATCH] D17100: Remove unused ToolChain arg from Driver::ConstructPhaseAction and BuildAction.

2016-02-10 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL260478: Remove unused ToolChain arg from 
Driver::ConstructPhaseAction and BuildAction. (authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D17100?vs=47526=47571#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D17100

Files:
  cfe/trunk/include/clang/Driver/Driver.h
  cfe/trunk/lib/Driver/Driver.cpp

Index: cfe/trunk/include/clang/Driver/Driver.h
===
--- cfe/trunk/include/clang/Driver/Driver.h
+++ cfe/trunk/include/clang/Driver/Driver.h
@@ -299,12 +299,10 @@
   /// given arguments, which are only done for a single architecture.
   ///
   /// \param C - The compilation that is being built.
-  /// \param TC - The default host tool chain.
   /// \param Args - The input arguments.
   /// \param Actions - The list to store the resulting actions onto.
-  void BuildActions(Compilation , const ToolChain ,
-llvm::opt::DerivedArgList , const InputList ,
-ActionList ) const;
+  void BuildActions(Compilation , llvm::opt::DerivedArgList ,
+const InputList , ActionList ) const;
 
   /// BuildUniversalActions - Construct the list of actions to perform
   /// for the given arguments, which may require a universal build.
@@ -376,9 +374,8 @@
   /// ConstructAction - Construct the appropriate action to do for
   /// \p Phase on the \p Input, taking in to account arguments
   /// like -fsyntax-only or --analyze.
-  Action *ConstructPhaseAction(Compilation , const ToolChain ,
-   const llvm::opt::ArgList , phases::ID Phase,
-   Action *Input) const;
+  Action *ConstructPhaseAction(Compilation , const llvm::opt::ArgList ,
+   phases::ID Phase, Action *Input) const;
 
   /// BuildJobsForAction - Construct the jobs to perform for the action \p A and
   /// return an InputInfo for the result of running \p A.  Will only construct
Index: cfe/trunk/lib/Driver/Driver.cpp
===
--- cfe/trunk/lib/Driver/Driver.cpp
+++ cfe/trunk/lib/Driver/Driver.cpp
@@ -510,8 +510,7 @@
   if (TC.getTriple().isOSBinFormatMachO())
 BuildUniversalActions(*C, C->getDefaultToolChain(), Inputs);
   else
-BuildActions(*C, C->getDefaultToolChain(), C->getArgs(), Inputs,
- C->getActions());
+BuildActions(*C, C->getArgs(), Inputs, C->getActions());
 
   if (CCCPrintPhases) {
 PrintActions(*C);
@@ -625,7 +624,7 @@
   if (TC.getTriple().isOSBinFormatMachO())
 BuildUniversalActions(C, TC, Inputs);
   else
-BuildActions(C, TC, C.getArgs(), Inputs, C.getActions());
+BuildActions(C, C.getArgs(), Inputs, C.getActions());
 
   BuildJobs(C);
 
@@ -1036,7 +1035,7 @@
 Archs.push_back(Args.MakeArgString(TC.getDefaultUniversalArchName()));
 
   ActionList SingleActions;
-  BuildActions(C, TC, Args, BAInputs, SingleActions);
+  BuildActions(C, Args, BAInputs, SingleActions);
 
   // Add in arch bindings for every top level action, as well as lipo and
   // dsymutil steps if needed.
@@ -1322,8 +1321,7 @@
   assert(C.getCudaDeviceToolChain() &&
  "Missing toolchain for device-side compilation.");
   ActionList CudaDeviceActions;
-  C.getDriver().BuildActions(C, *C.getCudaDeviceToolChain(), Args,
- CudaDeviceInputs, CudaDeviceActions);
+  C.getDriver().BuildActions(C, Args, CudaDeviceInputs, CudaDeviceActions);
   assert(GpuArchList.size() == CudaDeviceActions.size() &&
  "Failed to create actions for all devices");
 
@@ -1387,9 +1385,8 @@
   ActionList({FatbinAction}));
 }
 
-void Driver::BuildActions(Compilation , const ToolChain ,
-  DerivedArgList , const InputList ,
-  ActionList ) const {
+void Driver::BuildActions(Compilation , DerivedArgList ,
+  const InputList , ActionList ) const {
   llvm::PrettyStackTraceString CrashInfo("Building compilation actions");
 
   if (!SuppressMissingInputWarning && Inputs.empty()) {
@@ -1516,7 +1513,7 @@
 continue;
 
   // Otherwise construct the appropriate action.
-  Current = ConstructPhaseAction(C, TC, Args, Phase, Current);
+  Current = ConstructPhaseAction(C, Args, Phase, Current);
 
   if (InputType == types::TY_CUDA && Phase == CudaInjectionPhase) {
 Current = buildCudaActions(C, Args, InputArg, Current, Actions);
@@ -1553,9 +1550,8 @@
   Args.ClaimAllArgs(options::OPT_cuda_host_only);
 }
 
-Action *Driver::ConstructPhaseAction(Compilation , const ToolChain ,
- const ArgList , phases::ID Phase,
- Action *Input) const {
+Action *Driver::ConstructPhaseAction(Compilation , const ArgList ,
+ phases::ID Phase, Action 

r260479 - [CUDA] Don't crash when trying to printf a non-scalar object.

2016-02-10 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Wed Feb 10 20:00:52 2016
New Revision: 260479

URL: http://llvm.org/viewvc/llvm-project?rev=260479=rev
Log:
[CUDA] Don't crash when trying to printf a non-scalar object.

Summary:
We can't do the right thing, since there's no right thing to do, but at
least we can not crash the compiler.

Reviewers: majnemer, rnk

Subscribers: cfe-commits, jhen, tra

Differential Revision: http://reviews.llvm.org/D17103

Added:
cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu
Modified:
cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp

Modified: cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp?rev=260479=260478=260479=diff
==
--- cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp Wed Feb 10 20:00:52 2016
@@ -83,6 +83,13 @@ CodeGenFunction::EmitCUDADevicePrintfCal
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
+  // We don't know how to emit non-scalar varargs.
+  if (std::any_of(Args.begin() + 1, Args.end(),
+  [](const CallArg ) { return !A.RV.isScalar(); })) {
+CGM.ErrorUnsupported(E, "non-scalar arg to printf");
+return RValue::get(llvm::ConstantInt::get(IntTy, 0));
+  }
+
   // Construct and fill the args buffer that we'll pass to vprintf.
   llvm::Value *BufferPtr;
   if (Args.size() <= 1) {

Added: cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu?rev=260479=auto
==
--- cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu (added)
+++ cfe/trunk/test/CodeGenCUDA/printf-aggregate.cu Wed Feb 10 20:00:52 2016
@@ -0,0 +1,17 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: not %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -emit-llvm 
\
+// RUN:   -o - %s 2>&1 | FileCheck %s
+
+#include "Inputs/cuda.h"
+
+// Check that we don't crash when asked to printf a non-scalar arg.
+struct Struct {
+  int x;
+  int y;
+};
+__device__ void PrintfNonScalar() {
+  // CHECK: cannot compile this non-scalar arg to printf
+  printf("%d", Struct());
+}


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17103: [CUDA] Don't crash when trying to printf a non-scalar object.

2016-02-10 Thread Justin Lebar via cfe-commits
jlebar added a comment.

OK, talked to Reid irl.  Since this is just printf, not general varargs 
handling, the Simplest Thing That Could Possibly Work is to error-unsupported.  
Once we fix sema as described above, we can move the check there.  Will update 
the patch, thanks everyone.


http://reviews.llvm.org/D17103



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D17051: Get rid of CHECK-SAME-NOT in tests.

2016-02-09 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: majnemer.
jlebar added a subscriber: cfe-commits.

This isn't a FileCheck directive; it does nothing.

http://reviews.llvm.org/D17051

Files:
  test/CodeGenCXX/optnone-and-attributes.cpp
  test/CodeGenCXX/optnone-class-members.cpp
  test/CodeGenCXX/optnone-def-decl.cpp
  test/CodeGenCXX/optnone-templates.cpp
  test/Modules/ModuleDebugInfo.cpp
  test/Modules/ModuleDebugInfo.m

Index: test/Modules/ModuleDebugInfo.m
===
--- test/Modules/ModuleDebugInfo.m
+++ test/Modules/ModuleDebugInfo.m
@@ -31,8 +31,9 @@
 // CHECK: ![[MODULE]] = !DIModule(scope: null, name: "DebugObjC
 
 // CHECK: ![[TD_ENUM:.*]] = !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: elements:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
 // CHECK-SAME: name: "FwdDecl",
@@ -45,26 +46,30 @@
 // CHECK-SAME: elements:
 
 // CHECK: ![[TD_UNION:.*]] = !DICompositeType(tag: DW_TAG_union_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: elements:
+// CHECK-SAME: )
 
 // CHECK: !DIDerivedType(tag: DW_TAG_typedef, name: "TypedefUnion",
 // CHECK-SAME:   baseType: ![[TD_UNION]])
 
 // CHECK: !DIDerivedType(tag: DW_TAG_typedef, name: "TypedefEnum",
 // CHECK-SAME:   baseType: ![[TD_ENUM:.*]])
 
 // CHECK: ![[TD_STRUCT:.*]] = !DICompositeType(tag: DW_TAG_structure_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: elements:
+// CHECK-SAME: )
 // CHECK: !DIDerivedType(tag: DW_TAG_typedef, name: "TypedefStruct",
 // CHECK-SAME:   baseType: ![[TD_STRUCT]])
 
 // CHECK: !DICompositeType(tag: DW_TAG_union_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DISubprogram(name: "+[ObjCClass classMethod]",
 // CHECK-SAME:  scope: ![[MODULE]],
Index: test/Modules/ModuleDebugInfo.cpp
===
--- test/Modules/ModuleDebugInfo.cpp
+++ test/Modules/ModuleDebugInfo.cpp
@@ -20,25 +20,29 @@
 
 // CHECK: distinct !DICompileUnit(language: DW_LANG_{{.*}}C_plus_plus,
 // CHECK-SAME:isOptimized: false,
-// CHECK-SAME-NOT:splitDebugFilename:
-// CHECK: dwoId:
+// CHECK-NOT: splitDebugFilename:
+// CHECK-SAME:dwoId:
+// CHECK-SAME:)
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type, name: "Enum"
 // CHECK-SAME: identifier: "_ZTSN8DebugCXX4EnumE")
 // CHECK: !DINamespace(name: "DebugCXX"
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: identifier: "_ZTS11TypedefEnum")
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 // CHECK: !DIEnumerator(name: "e5", value: 5)
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "Struct"
@@ -61,11 +65,11 @@
 // CHECK: !DIDerivedType(tag: DW_TAG_member, name: "_vptr$FwdVirtual"
 
 // CHECK: !DICompositeType(tag: DW_TAG_union_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: identifier: "_ZTS12TypedefUnion")
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: identifier: "_ZTS13TypedefStruct")
 
 // CHECK: !DIDerivedType(tag: DW_TAG_typedef, name: "FloatInstatiation"
@@ -75,10 +79,12 @@
 // no mangled name here yet.
 
 // CHECK: !DICompositeType(tag: DW_TAG_union_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
 // CHECK-SAME: name: "InAnonymousNamespace",
Index: test/CodeGenCXX/optnone-templates.cpp
===
--- test/CodeGenCXX/optnone-templates.cpp
+++ 

Re: [PATCH] D17051: Get rid of CHECK-SAME-NOT in tests.

2016-02-09 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: test/Modules/ModuleDebugInfo.cpp:10
@@ -9,3 +9,3 @@
 // RUN: cat %t-mod.ll | FileCheck %s
 // RUN: cat %t-mod.ll | FileCheck --check-prefix=CHECK-NEG %s
 

jroelofs wrote:
> While you're here, may as well shorten these three lines to:
> 
> 
> ```
> // RUN: %clang_cc1 -triple %itanium_abi_triple -x objective-c++ -std=c++11 
> -debug-info-kind=limited -fmodules -fmodule-format=obj -fimplicit-module-maps 
> -DMODULES -fmodules-cache-path=%t %s -I %S/Inputs -I %t -emit-llvm -o %t.ll 
> -mllvm -debug-only=pchcontainer | FileCheck %s --check-prefix=CHECK 
> --check-prefix=CHECK-NEG
> ```
> 
> (as long as you also move the one CHECK-NEG-NOT line up before all of the 
> other `CHECK` lines)
I don't think that means the same thing?  CHECK-NOT: foo checks that "foo" does 
not appear between the last match (or the beginning of the file, if there was 
no last match) *and the next match*.


Comment at: test/Modules/ModuleDebugInfo.cpp:25
@@ -25,1 +24,3 @@
+// CHECK-SAME:dwoId:
+// CHECK-SAME:)
 

jroelofs wrote:
> This `CHECK-SAME` line and all the others are still dead. There's no 
> `--check-prefix=CHECK-SAME`. This is really fishy.
CHECK-SAME means "check on the same line"?


http://reviews.llvm.org/D17051



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-09 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: majnemer.
jlebar added subscribers: tra, echristo, jhen, cfe-commits.

This is important for e.g. the following case:

  void sync() { __syncthreads(); }
  void foo() {
do_something();
sync();
do_something_else():
  }

Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.

This chagne is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.

http://reviews.llvm.org/D17056

Files:
  lib/CodeGen/CodeGenModule.cpp
  test/CodeGenCUDA/convergent.cu

Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,34 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() { baz(); }
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CodeGenModule.cpp
===
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -1875,6 +1875,14 @@
  B));
   }
 
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions in CUDA as convergent (meaning, they
+// may call an intrinsicly convergent op, such as __syncthreads(), and so
+// can't have certain optimizations applied around them).  LLVM will remove
+// this attribute where it safely can.
+F->addFnAttr(llvm::Attribute::Convergent);
+  }
+
   if (!DontDefer) {
 // All MSVC dtors other than the base dtor are linkonce_odr and delegate to
 // each other bottoming out with the base dtor.  Therefore we emit non-base


Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,34 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() { baz(); }
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CodeGenModule.cpp
===
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -1875,6 +1875,14 @@
  B));
   }
 
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions in CUDA as convergent (meaning, they
+// may call an intrinsicly convergent op, such as __syncthreads(), and so
+// can't have certain optimizations applied around them).  LLVM will remove
+// this attribute where it safely can.
+F->addFnAttr(llvm::Attribute::Convergent);
+  }
+
   if (!DontDefer) {
 // All MSVC dtors other than the base dtor are linkonce_odr and delegate to
 // each other bottoming out with the base dtor.  Therefore we emit non-base
___
cfe-commits mailing list
cfe-commits@lists.llvm.org

Re: [PATCH] D17051: Get rid of CHECK-SAME-NOT in tests.

2016-02-09 Thread Justin Lebar via cfe-commits
jlebar marked 3 inline comments as done.


Comment at: test/Modules/ModuleDebugInfo.cpp:10
@@ -9,3 +9,3 @@
 // RUN: cat %t-mod.ll | FileCheck %s
 // RUN: cat %t-mod.ll | FileCheck --check-prefix=CHECK-NEG %s
 

jlebar wrote:
> jlebar wrote:
> > jroelofs wrote:
> > > While you're here, may as well shorten these three lines to:
> > > 
> > > 
> > > ```
> > > // RUN: %clang_cc1 -triple %itanium_abi_triple -x objective-c++ 
> > > -std=c++11 -debug-info-kind=limited -fmodules -fmodule-format=obj 
> > > -fimplicit-module-maps -DMODULES -fmodules-cache-path=%t %s -I %S/Inputs 
> > > -I %t -emit-llvm -o %t.ll -mllvm -debug-only=pchcontainer | FileCheck %s 
> > > --check-prefix=CHECK --check-prefix=CHECK-NEG
> > > ```
> > > 
> > > (as long as you also move the one CHECK-NEG-NOT line up before all of the 
> > > other `CHECK` lines)
> > I don't think that means the same thing?  CHECK-NOT: foo checks that "foo" 
> > does not appear between the last match (or the beginning of the file, if 
> > there was no last match) *and the next match*.
> Oh, I see, it's CHECK-NEG-NOT, so there are no other instances of CHECK-NEG, 
> it's fine how you say.  I'll change it, sure.
Actually, if you don't mind, I'd rather do that in a separate patch, in case it 
breaks something.  I'll send you a patch in a sec.


http://reviews.llvm.org/D17051



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17051: Get rid of CHECK-SAME-NOT in tests.

2016-02-09 Thread Justin Lebar via cfe-commits
jlebar marked 2 inline comments as done.


Comment at: test/Modules/ModuleDebugInfo.cpp:10
@@ -9,3 +9,3 @@
 // RUN: cat %t-mod.ll | FileCheck %s
 // RUN: cat %t-mod.ll | FileCheck --check-prefix=CHECK-NEG %s
 

jroelofs wrote:
> jlebar wrote:
> > jlebar wrote:
> > > jlebar wrote:
> > > > jroelofs wrote:
> > > > > While you're here, may as well shorten these three lines to:
> > > > > 
> > > > > 
> > > > > ```
> > > > > // RUN: %clang_cc1 -triple %itanium_abi_triple -x objective-c++ 
> > > > > -std=c++11 -debug-info-kind=limited -fmodules -fmodule-format=obj 
> > > > > -fimplicit-module-maps -DMODULES -fmodules-cache-path=%t %s -I 
> > > > > %S/Inputs -I %t -emit-llvm -o %t.ll -mllvm -debug-only=pchcontainer | 
> > > > > FileCheck %s --check-prefix=CHECK --check-prefix=CHECK-NEG
> > > > > ```
> > > > > 
> > > > > (as long as you also move the one CHECK-NEG-NOT line up before all of 
> > > > > the other `CHECK` lines)
> > > > I don't think that means the same thing?  CHECK-NOT: foo checks that 
> > > > "foo" does not appear between the last match (or the beginning of the 
> > > > file, if there was no last match) *and the next match*.
> > > Oh, I see, it's CHECK-NEG-NOT, so there are no other instances of 
> > > CHECK-NEG, it's fine how you say.  I'll change it, sure.
> > Actually, if you don't mind, I'd rather do that in a separate patch, in 
> > case it breaks something.  I'll send you a patch in a sec.
> I don't mind.
OK, this is not so simple (or else, perhaps, it's I who is simple).  The output 
is sometimes empty, which makes FileCheck unhappy, and I don't even know what 
else is going on.  I've spent 20 minutes on this now, which I think is a fair 
penalty for stirring this pot.  :)  Feel free to send me a review.


http://reviews.llvm.org/D17051



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17056: Mark all CUDA device-side function defs and decls as convergent.

2016-02-09 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 47409.
jlebar marked 2 inline comments as done.
jlebar added a comment.

Update per tra's review.


http://reviews.llvm.org/D17056

Files:
  lib/CodeGen/CodeGenModule.cpp
  test/CodeGenCUDA/convergent.cu

Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,35 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() { baz(); }
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CodeGenModule.cpp
===
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -1875,6 +1875,14 @@
  B));
   }
 
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions in CUDA as convergent (meaning, they
+// may call an intrinsically convergent op, such as __syncthreads(), and so
+// can't have certain optimizations applied around them).  LLVM will remove
+// this attribute where it safely can.
+F->addFnAttr(llvm::Attribute::Convergent);
+  }
+
   if (!DontDefer) {
 // All MSVC dtors other than the base dtor are linkonce_odr and delegate to
 // each other bottoming out with the base dtor.  Therefore we emit non-base


Index: test/CodeGenCUDA/convergent.cu
===
--- /dev/null
+++ test/CodeGenCUDA/convergent.cu
@@ -0,0 +1,35 @@
+// REQUIRES: x86-registered-target
+// REQUIRES: nvptx-registered-target
+
+// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | FileCheck -check-prefix DEVICE %s
+
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
+// RUN:   -disable-llvm-passes -o - %s | \
+// RUN:  FileCheck -check-prefix HOST %s
+
+#include "Inputs/cuda.h"
+
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3foov
+__device__ void foo() {}
+
+// HOST: Function Attrs:
+// HOST-NOT: convergent
+// HOST-NEXT: define void @_Z3barv
+// DEVICE: Function Attrs:
+// DEVICE-SAME: convergent
+// DEVICE-NEXT: define void @_Z3barv
+__host__ __device__ void baz();
+__host__ __device__ void bar() { baz(); }
+
+// DEVICE: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// DEVICE: attributes [[BAZ_ATTR]] = {
+// DEVICE-SAME: convergent
+// DEVICE-SAME: }
+
+// HOST: declare void @_Z3bazv() [[BAZ_ATTR:#[0-9]+]]
+// HOST: attributes [[BAZ_ATTR]] = {
+// HOST-NOT: convergent
+// NOST-SAME: }
Index: lib/CodeGen/CodeGenModule.cpp
===
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -1875,6 +1875,14 @@
  B));
   }
 
+  if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
+// Conservatively, mark all functions in CUDA as convergent (meaning, they
+// may call an intrinsically convergent op, such as __syncthreads(), and so
+// can't have certain optimizations applied around them).  LLVM will remove
+// this attribute where it safely can.
+F->addFnAttr(llvm::Attribute::Convergent);
+  }
+
   if (!DontDefer) {
 // All MSVC dtors other than the base dtor are linkonce_odr and delegate to
 // each other bottoming out with the base dtor.  Therefore we emit non-base
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D17051: Get rid of CHECK-SAME-NOT in tests.

2016-02-09 Thread Justin Lebar via cfe-commits
jlebar marked an inline comment as done.


Comment at: test/Modules/ModuleDebugInfo.cpp:10
@@ -9,3 +9,3 @@
 // RUN: cat %t-mod.ll | FileCheck %s
 // RUN: cat %t-mod.ll | FileCheck --check-prefix=CHECK-NEG %s
 

jlebar wrote:
> jroelofs wrote:
> > While you're here, may as well shorten these three lines to:
> > 
> > 
> > ```
> > // RUN: %clang_cc1 -triple %itanium_abi_triple -x objective-c++ -std=c++11 
> > -debug-info-kind=limited -fmodules -fmodule-format=obj 
> > -fimplicit-module-maps -DMODULES -fmodules-cache-path=%t %s -I %S/Inputs -I 
> > %t -emit-llvm -o %t.ll -mllvm -debug-only=pchcontainer | FileCheck %s 
> > --check-prefix=CHECK --check-prefix=CHECK-NEG
> > ```
> > 
> > (as long as you also move the one CHECK-NEG-NOT line up before all of the 
> > other `CHECK` lines)
> I don't think that means the same thing?  CHECK-NOT: foo checks that "foo" 
> does not appear between the last match (or the beginning of the file, if 
> there was no last match) *and the next match*.
Oh, I see, it's CHECK-NEG-NOT, so there are no other instances of CHECK-NEG, 
it's fine how you say.  I'll change it, sure.


http://reviews.llvm.org/D17051



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r260334 - Get rid of CHECK-SAME-NOT in tests.

2016-02-09 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Tue Feb  9 18:38:15 2016
New Revision: 260334

URL: http://llvm.org/viewvc/llvm-project?rev=260334=rev
Log:
Get rid of CHECK-SAME-NOT in tests.

Summary: This isn't a FileCheck directive; it does nothing.

Reviewers: jroelofs

Subscribers: cfe-commits, majnemer

Differential Revision: http://reviews.llvm.org/D17051

Modified:
cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp
cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp
cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp
cfe/trunk/test/CodeGenCXX/optnone-templates.cpp
cfe/trunk/test/Modules/ModuleDebugInfo.cpp
cfe/trunk/test/Modules/ModuleDebugInfo.m

Modified: cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp?rev=260334=260333=260334=diff
==
--- cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp (original)
+++ cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp Tue Feb  9 18:38:15 
2016
@@ -79,4 +79,4 @@ int exported_optnone_func(int a) {
 // CHECK: attributes [[NORETURN]] = { noinline noreturn {{.*}} optnone
 
 // CHECK: attributes [[DLLIMPORT]] =
-// CHECK-SAME-NOT: optnone
+// CHECK-NOT: optnone

Modified: cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp?rev=260334=260333=260334=diff
==
--- cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp (original)
+++ cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp Tue Feb  9 18:38:15 2016
@@ -159,6 +159,6 @@ int bar() {
 
 
 // CHECK: attributes [[NORMAL]] =
-// CHECK-SAME-NOT: noinline
-// CHECK-SAME-NOT: optnone
+// CHECK-NOT: noinline
+// CHECK-NOT: optnone
 // CHECK: attributes [[OPTNONE]] = {{.*}} noinline {{.*}} optnone

Modified: cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp?rev=260334=260333=260334=diff
==
--- cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp (original)
+++ cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp Tue Feb  9 18:38:15 2016
@@ -91,5 +91,5 @@ int user_of_forceinline_optnone_function
 
 // CHECK: attributes [[OPTNONE]] = { noinline nounwind optnone {{.*}} }
 // CHECK: attributes [[NORMAL]] =
-// CHECK-SAME-NOT: noinline
-// CHECK-SAME-NOT: optnone
+// CHECK-NOT: noinline
+// CHECK-NOT: optnone

Modified: cfe/trunk/test/CodeGenCXX/optnone-templates.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/optnone-templates.cpp?rev=260334=260333=260334=diff
==
--- cfe/trunk/test/CodeGenCXX/optnone-templates.cpp (original)
+++ cfe/trunk/test/CodeGenCXX/optnone-templates.cpp Tue Feb  9 18:38:15 2016
@@ -100,5 +100,5 @@ void container3()
 
 
 // CHECK: attributes [[NORMAL]] =
-// CHECK-SAME-NOT: optnone
+// CHECK-NOT: optnone
 // CHECK: attributes [[OPTNONE]] = {{.*}} optnone

Modified: cfe/trunk/test/Modules/ModuleDebugInfo.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Modules/ModuleDebugInfo.cpp?rev=260334=260333=260334=diff
==
--- cfe/trunk/test/Modules/ModuleDebugInfo.cpp (original)
+++ cfe/trunk/test/Modules/ModuleDebugInfo.cpp Tue Feb  9 18:38:15 2016
@@ -20,25 +20,29 @@
 
 // CHECK: distinct !DICompileUnit(language: DW_LANG_{{.*}}C_plus_plus,
 // CHECK-SAME:isOptimized: false,
-// CHECK-SAME-NOT:splitDebugFilename:
-// CHECK: dwoId:
+// CHECK-NOT: splitDebugFilename:
+// CHECK-SAME:dwoId:
+// CHECK-SAME:)
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type, name: "Enum"
 // CHECK-SAME: identifier: "_ZTSN8DebugCXX4EnumE")
 // CHECK: !DINamespace(name: "DebugCXX"
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: identifier: "_ZTS11TypedefEnum")
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 // CHECK: !DIEnumerator(name: "e5", value: 5)
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "Struct"
@@ -61,11 +65,11 @@
 // CHECK: !DIDerivedType(tag: DW_TAG_member, name: 

Re: [PATCH] D17051: Get rid of CHECK-SAME-NOT in tests.

2016-02-09 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL260334: Get rid of CHECK-SAME-NOT in tests. (authored by 
jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D17051?vs=47385=47397#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D17051

Files:
  cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp
  cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp
  cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp
  cfe/trunk/test/CodeGenCXX/optnone-templates.cpp
  cfe/trunk/test/Modules/ModuleDebugInfo.cpp
  cfe/trunk/test/Modules/ModuleDebugInfo.m

Index: cfe/trunk/test/Modules/ModuleDebugInfo.m
===
--- cfe/trunk/test/Modules/ModuleDebugInfo.m
+++ cfe/trunk/test/Modules/ModuleDebugInfo.m
@@ -31,8 +31,9 @@
 // CHECK: ![[MODULE]] = !DIModule(scope: null, name: "DebugObjC
 
 // CHECK: ![[TD_ENUM:.*]] = !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: elements:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
 // CHECK-SAME: name: "FwdDecl",
@@ -45,26 +46,30 @@
 // CHECK-SAME: elements:
 
 // CHECK: ![[TD_UNION:.*]] = !DICompositeType(tag: DW_TAG_union_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: elements:
+// CHECK-SAME: )
 
 // CHECK: !DIDerivedType(tag: DW_TAG_typedef, name: "TypedefUnion",
 // CHECK-SAME:   baseType: ![[TD_UNION]])
 
 // CHECK: !DIDerivedType(tag: DW_TAG_typedef, name: "TypedefEnum",
 // CHECK-SAME:   baseType: ![[TD_ENUM:.*]])
 
 // CHECK: ![[TD_STRUCT:.*]] = !DICompositeType(tag: DW_TAG_structure_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: elements:
+// CHECK-SAME: )
 // CHECK: !DIDerivedType(tag: DW_TAG_typedef, name: "TypedefStruct",
 // CHECK-SAME:   baseType: ![[TD_STRUCT]])
 
 // CHECK: !DICompositeType(tag: DW_TAG_union_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DISubprogram(name: "+[ObjCClass classMethod]",
 // CHECK-SAME:  scope: ![[MODULE]],
Index: cfe/trunk/test/Modules/ModuleDebugInfo.cpp
===
--- cfe/trunk/test/Modules/ModuleDebugInfo.cpp
+++ cfe/trunk/test/Modules/ModuleDebugInfo.cpp
@@ -20,25 +20,29 @@
 
 // CHECK: distinct !DICompileUnit(language: DW_LANG_{{.*}}C_plus_plus,
 // CHECK-SAME:isOptimized: false,
-// CHECK-SAME-NOT:splitDebugFilename:
-// CHECK: dwoId:
+// CHECK-NOT: splitDebugFilename:
+// CHECK-SAME:dwoId:
+// CHECK-SAME:)
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type, name: "Enum"
 // CHECK-SAME: identifier: "_ZTSN8DebugCXX4EnumE")
 // CHECK: !DINamespace(name: "DebugCXX"
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: identifier: "_ZTS11TypedefEnum")
 
 // CHECK: !DICompositeType(tag: DW_TAG_enumeration_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 // CHECK: !DIEnumerator(name: "e5", value: 5)
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type, name: "Struct"
@@ -61,11 +65,11 @@
 // CHECK: !DIDerivedType(tag: DW_TAG_member, name: "_vptr$FwdVirtual"
 
 // CHECK: !DICompositeType(tag: DW_TAG_union_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: identifier: "_ZTS12TypedefUnion")
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
 // CHECK-SAME: identifier: "_ZTS13TypedefStruct")
 
 // CHECK: !DIDerivedType(tag: DW_TAG_typedef, name: "FloatInstatiation"
@@ -75,10 +79,12 @@
 // no mangled name here yet.
 
 // CHECK: !DICompositeType(tag: DW_TAG_union_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
-// CHECK-SAME-NOT: name:
+// CHECK-NOT:  name:
+// CHECK-SAME: )
 
 // CHECK: !DICompositeType(tag: DW_TAG_structure_type,
 // CHECK-SAME:  

Re: r260334 - Get rid of CHECK-SAME-NOT in tests.

2016-02-09 Thread Justin Lebar via cfe-commits
> The paren check looks redundant.
> And this one.

Fixed in r260346.

On Tue, Feb 9, 2016 at 5:29 PM, Robinson, Paul
<paul_robin...@playstation.sony.com> wrote:
> Well I'll be-- thanks!
> See post-commit comments, see below, tidying up just a bit.
> --paulr
>
>> -Original Message-
>> From: cfe-commits [mailto:cfe-commits-boun...@lists.llvm.org] On Behalf Of
>> Justin Lebar via cfe-commits
>> Sent: Tuesday, February 09, 2016 4:38 PM
>> To: cfe-commits@lists.llvm.org
>> Subject: r260334 - Get rid of CHECK-SAME-NOT in tests.
>>
>> Author: jlebar
>> Date: Tue Feb  9 18:38:15 2016
>> New Revision: 260334
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=260334=rev
>> Log:
>> Get rid of CHECK-SAME-NOT in tests.
>>
>> Summary: This isn't a FileCheck directive; it does nothing.
>>
>> Reviewers: jroelofs
>>
>> Subscribers: cfe-commits, majnemer
>>
>> Differential Revision: http://reviews.llvm.org/D17051
>>
>> Modified:
>> cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp
>> cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp
>> cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp
>> cfe/trunk/test/CodeGenCXX/optnone-templates.cpp
>> cfe/trunk/test/Modules/ModuleDebugInfo.cpp
>> cfe/trunk/test/Modules/ModuleDebugInfo.m
>>
>> Modified: cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp
>> URL: http://llvm.org/viewvc/llvm-
>> project/cfe/trunk/test/CodeGenCXX/optnone-and-
>> attributes.cpp?rev=260334=260333=260334=diff
>> ==
>> 
>> --- cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp (original)
>> +++ cfe/trunk/test/CodeGenCXX/optnone-and-attributes.cpp Tue Feb  9
>> 18:38:15 2016
>> @@ -79,4 +79,4 @@ int exported_optnone_func(int a) {
>>  // CHECK: attributes [[NORETURN]] = { noinline noreturn {{.*}} optnone
>>
>>  // CHECK: attributes [[DLLIMPORT]] =
>> -// CHECK-SAME-NOT: optnone
>> +// CHECK-NOT: optnone
>>
>> Modified: cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp
>> URL: http://llvm.org/viewvc/llvm-
>> project/cfe/trunk/test/CodeGenCXX/optnone-class-
>> members.cpp?rev=260334=260333=260334=diff
>> ==
>> 
>> --- cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp (original)
>> +++ cfe/trunk/test/CodeGenCXX/optnone-class-members.cpp Tue Feb  9
>> 18:38:15 2016
>> @@ -159,6 +159,6 @@ int bar() {
>>
>>
>>  // CHECK: attributes [[NORMAL]] =
>> -// CHECK-SAME-NOT: noinline
>> -// CHECK-SAME-NOT: optnone
>> +// CHECK-NOT: noinline
>> +// CHECK-NOT: optnone
>>  // CHECK: attributes [[OPTNONE]] = {{.*}} noinline {{.*}} optnone
>>
>> Modified: cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp
>> URL: http://llvm.org/viewvc/llvm-
>> project/cfe/trunk/test/CodeGenCXX/optnone-def-
>> decl.cpp?rev=260334=260333=260334=diff
>> ==
>> 
>> --- cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp (original)
>> +++ cfe/trunk/test/CodeGenCXX/optnone-def-decl.cpp Tue Feb  9 18:38:15
>> 2016
>> @@ -91,5 +91,5 @@ int user_of_forceinline_optnone_function
>>
>>  // CHECK: attributes [[OPTNONE]] = { noinline nounwind optnone {{.*}} }
>>  // CHECK: attributes [[NORMAL]] =
>> -// CHECK-SAME-NOT: noinline
>> -// CHECK-SAME-NOT: optnone
>> +// CHECK-NOT: noinline
>> +// CHECK-NOT: optnone
>>
>> Modified: cfe/trunk/test/CodeGenCXX/optnone-templates.cpp
>> URL: http://llvm.org/viewvc/llvm-
>> project/cfe/trunk/test/CodeGenCXX/optnone-
>> templates.cpp?rev=260334=260333=260334=diff
>> ==
>> 
>> --- cfe/trunk/test/CodeGenCXX/optnone-templates.cpp (original)
>> +++ cfe/trunk/test/CodeGenCXX/optnone-templates.cpp Tue Feb  9 18:38:15
>> 2016
>> @@ -100,5 +100,5 @@ void container3()
>>
>>
>>  // CHECK: attributes [[NORMAL]] =
>> -// CHECK-SAME-NOT: optnone
>> +// CHECK-NOT: optnone
>>  // CHECK: attributes [[OPTNONE]] = {{.*}} optnone
>>
>> Modified: cfe/trunk/test/Modules/ModuleDebugInfo.cpp
>> URL: http://llvm.org/viewvc/llvm-
>> project/cfe/trunk/test/Modules/ModuleDebugInfo.cpp?rev=260334=260333
>> =260334=diff
>> ==
>> 
>> --- cfe/trunk/test/Modules/ModuleDebugInfo.cpp

Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-02-09 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Okay, I see why things don't work as expected without this patch but do work 
for e.g. macos universal binaries.

The reason is, we build a completely separate set of actions for each 
invocation of cc1 -- one for the host compilation, and one for each device 
arch.  Then the logic inside Compilation.cpp, which is in fact trying not to 
display duplicate errors, doesn't work, because it doesn't know that these 
compilations are related.

I think I may be able to fix this.


http://reviews.llvm.org/D16514



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16932: [CUDA] Bug 26497 : Remove wrappers for variants already provided by CUDA headers.

2016-02-05 Thread Justin Lebar via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Is it worth having a test, if only for one or two functions?


http://reviews.llvm.org/D16932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16932: [CUDA] Bug 26497 : Remove wrappers for variants already provided by CUDA headers.

2016-02-05 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Oh right, CUDA headers.

We really need to get a buildbot and/or tests in test-suite set up.


http://reviews.llvm.org/D16932



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16870: [CUDA] Tweak attribute-based overload resolution to match nvcc behavior.

2016-02-04 Thread Justin Lebar via cfe-commits
jlebar accepted this revision.
jlebar added a comment.
This revision is now accepted and ready to land.

Looks sane to me.  Just some suggestions on the comments.



Comment at: lib/Sema/SemaCUDA.cpp:71
@@ -70,3 +70,3 @@
 // H  - handled in (x)
-// Preferences: b-best, f-fallback, l-last resort, n-never.
+// Preferences: +:native, *:host-device, o:same side, .:wrong side, -:never.
 //

If we're going to use symbols rather than letters, could we use 4, 3, 2, 1, 0?  
I think that would be easier to follow.


Comment at: lib/Sema/SemaCUDA.cpp:127
@@ -132,9 +126,3 @@
   if (CallerTarget == CFT_HostDevice) {
-// Calling a function that matches compilation mode is OK.
-// Calling a function from the other side is frowned upon.
-if (getLangOpts().CUDAIsDevice)
-  return CalleeTarget == CFT_Device ? CFP_Fallback : QuestionableResult;
-else
-  return (CalleeTarget == CFT_Host || CalleeTarget == CFT_Global)
- ? CFP_Fallback
- : QuestionableResult;
+// It's OK to call mode-matching function from HD one.
+if ((getLangOpts().CUDAIsDevice && CalleeTarget == CFT_Device) ||

Nit: "It's OK to call a mode-matching function from an HD function."


Comment at: lib/Sema/SemaOverload.cpp:8536
@@ +8535,3 @@
+  // compatible with existing code that relies on this. If we see such
+  // a case, return better variant right away.
+  if (S.getLangOpts().CUDA && S.getLangOpts().CUDATargetOverloads &&

Since we have language lawyers on the team, suggest adding articles to comment:

If an HD function calls a function which has host-only and device-only 
overloads, nvcc sees only the host-side function during host compilation and 
only the device function during device-side compilation.  (This appears to be a 
side-effect of its splitting of host and device code into separate TUs.)  Alas 
we need to be compatible with existing code that relies on this, so if we see 
such a case, return the better variant right away.

I actually might suggest rephrasing this a bit more, to something like:

When performing host-side compilation, nvcc doesn't see device functions, and 
similarly when performing device-side compilation, nvcc doesn't see host 
functions.  (This is a consequence of the fact that it splits host and device 
code into separate TUs.)  We see all functions in both compilation modes, so to 
match nvcc's behavior, we need to exclude some overload candidates from 
consideration based only on their host/device attributes.  Specifically, if one 
candidate call is WrongSide and the other is Native or SameSide, we ignore the 
WrongSide candidate.  If we don't return early here, we'll consider the CUDA 
target attributes again later in this function, as a tiebreaker between calls 
with otherwise identical priority according to the regular C++ overloading 
rules.


Comment at: test/CodeGenCUDA/function-overload.cu:96
@@ +95,3 @@
+
+// In this case during host compilation we expect to cal function
+// template even if __device__ function may be available and allowed

call


http://reviews.llvm.org/D16870



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16638: [CUDA] Added device-side system call decls and related wrappers.

2016-02-03 Thread Justin Lebar via cfe-commits
jlebar added a comment.

lgtm.  Thank you.


http://reviews.llvm.org/D16638



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16638: [CUDA] Added device-side system call decls and related wrappers.

2016-02-03 Thread Justin Lebar via cfe-commits
jlebar added a comment.

lg with one question about printf.



Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:237
@@ +236,3 @@
+// device-side declaration for it.
+__device__ int printf(const char *, ...);
+} // extern "C"

I think we want an attribute on this so that we know it's printf-like, so you 
get appropriate format-string warnings.  Unless the compiler is going to add 
said attribute automatically based on the just the function name.


http://reviews.llvm.org/D16638



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16638: [CUDA] Added device-side std::{malloc/free}

2016-02-02 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:215
@@ +214,3 @@
+// Device-side CUDA system calls.
+// 
http://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#system-calls
+

It seems that only vprintf, free, malloc, and __assertfail are syscalls?  If so 
please rejigger this comment so that it's clear what it is and isn't covering.


Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:232
@@ +231,3 @@
+// Clang will convert printf into vprintf, but we still need
+// device-side declaration for it.
+__device__ int printf(const char *, ...);

I'd prefer to keep the information from the previous comment: This declaration 
is there for type-safety, not because things will fail to compile if it is 
removed.

Otherwise someone may look at this, delete this definition, see that everything 
still works, and conclude that they can safely remove this line.


Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:234
@@ +233,3 @@
+__device__ int printf(const char *, ...);
+}
+

Please put "// extern "C"


Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:236
@@ +235,3 @@
+
+// We also need device-side std::malloc and std::free
+namespace std {

Nit: Period


Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:242
@@ +241,3 @@
+}
+}
+

Please add "//namespace std"


http://reviews.llvm.org/D16638



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-01-29 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Talking to echristo irl, he would like to know why we don't have this problem 
with mac universal binaries -- or, do we?  He would like to be consistent; I'm 
onboard with that.


http://reviews.llvm.org/D16514



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D15305: [CUDA] Do not allow dynamic initialization of global device side variables.

2016-01-29 Thread Justin Lebar via cfe-commits
jlebar added a comment.

jingyue/jpienaar/rsmith - friendly ping?  Without this, -O0 builds don't work, 
because they emit empty global initializers that don't get optimized out.


http://reviews.llvm.org/D15305



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-01-29 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Eric, are you OK with this going in, or do you want to consider alternatives?


http://reviews.llvm.org/D16514



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar marked an inline comment as done.


Comment at: lib/CodeGen/CGCUDABuiltin.cpp:109
@@ -106,1 +108,3 @@
+// stacksave/stackrestore intrinsics, which cause ptxas to choke.
+auto *Alloca = new llvm::AllocaInst(
 llvm::Type::getInt8Ty(Ctx), llvm::ConstantInt::get(Int32Ty, BufSize),

echristo wrote:
> Not quite, you'll want to use AllocaInsertPt for this or even 
> CreateTempAlloca.
> 
> 
Aha.  Used AllocaInsertPt because it doesn't seem that there's an overload of 
CreateTempAlloca that takes an explicit size.


http://reviews.llvm.org/D16664



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar marked an inline comment as done.
jlebar added a comment.

http://reviews.llvm.org/D16664



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 46300.
jlebar marked an inline comment as done.
jlebar added a comment.

Address tra's review comment (rename flag).


http://reviews.llvm.org/D16514

Files:
  include/clang/Driver/Compilation.h
  include/clang/Driver/Driver.h
  include/clang/Driver/Options.td
  lib/Driver/Compilation.cpp
  lib/Driver/Driver.cpp

Index: lib/Driver/Driver.cpp
===
--- lib/Driver/Driver.cpp
+++ lib/Driver/Driver.cpp
@@ -58,7 +58,8 @@
   CCPrintHeadersFilename(nullptr), CCLogDiagnosticsFilename(nullptr),
   CCCPrintBindings(false), CCPrintHeaders(false), CCLogDiagnostics(false),
   CCGenDiagnostics(false), CCCGenericGCCName(""), CheckInputsExist(true),
-  CCCUsePCH(true), SuppressMissingInputWarning(false) {
+  CCCUsePCH(true), SuppressMissingInputWarning(false),
+  StopOnJobFailure(false) {
 
   // Provide a sane fallback if no VFS is specified.
   if (!this->VFS)
@@ -505,6 +506,16 @@
   InputList Inputs;
   BuildInputs(C->getDefaultToolChain(), *TranslatedArgs, Inputs);
 
+  // StopOnJobFailure defaults to false, except for CUDA compilations.
+  if (Arg *A = C->getArgs().getLastArg(options::OPT_stop_on_failure,
+   options::OPT_no_stop_on_failure))
+StopOnJobFailure = A->getOption().matches(options::OPT_stop_on_failure);
+  else
+StopOnJobFailure =
+llvm::any_of(Inputs, [](const std::pair ) {
+  return I.first == types::TY_CUDA;
+});
+
   // Construct the list of abstract actions to perform for this compilation. On
   // MachO targets this uses the driver-driver and universal actions.
   if (TC.getTriple().isOSBinFormatMachO())
@@ -638,7 +649,7 @@
 
   // Generate preprocessed output.
   SmallVector, 4> FailingCommands;
-  C.ExecuteJobs(C.getJobs(), FailingCommands);
+  C.ExecuteJobs(C.getJobs(), /* StopOnFailure = */ false, FailingCommands);
 
   // If any of the preprocessing commands failed, clean up and exit.
   if (!FailingCommands.empty()) {
@@ -730,7 +741,7 @@
   for (auto  : C.getJobs())
 setUpResponseFiles(C, Job);
 
-  C.ExecuteJobs(C.getJobs(), FailingCommands);
+  C.ExecuteJobs(C.getJobs(), StopOnJobFailure, FailingCommands);
 
   // Remove temp files.
   C.CleanupFileList(C.getTempFiles());
Index: lib/Driver/Compilation.cpp
===
--- lib/Driver/Compilation.cpp
+++ lib/Driver/Compilation.cpp
@@ -188,14 +188,17 @@
   return !ActionFailed((), FailingCommands);
 }
 
-void Compilation::ExecuteJobs(const JobList ,
+void Compilation::ExecuteJobs(const JobList , bool StopOnFailure,
   FailingCommandList ) const {
   for (const auto  : Jobs) {
 if (!InputsOk(Job, FailingCommands))
   continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  if (StopOnFailure)
+return;
+}
   }
 }
 
Index: include/clang/Driver/Options.td
===
--- include/clang/Driver/Options.td
+++ include/clang/Driver/Options.td
@@ -1801,6 +1801,11 @@
 def : Flag<["-"], "no-integrated-as">, Alias,
   Flags<[CC1Option, DriverOption]>;
 
+def stop_on_failure : Flag<["-"], "stop-on-failure">, Flags<[DriverOption]>,
+  HelpText<"Stop running jobs as soon as one fails.  This is the default during "
+"CUDA compilation without --save-temps.">;
+def no_stop_on_failure : Flag<["-"], "no-stop-on-failure">, Flags<[DriverOption]>;
+
 def working_directory : JoinedOrSeparate<["-"], "working-directory">, Flags<[CC1Option]>,
   HelpText<"Resolve file paths relative to the specified directory">;
 def working_directory_EQ : Joined<["-"], "working-directory=">, Flags<[CC1Option]>,
Index: include/clang/Driver/Driver.h
===
--- include/clang/Driver/Driver.h
+++ include/clang/Driver/Driver.h
@@ -192,6 +192,10 @@
   /// Certain options suppress the 'no input files' warning.
   bool SuppressMissingInputWarning : 1;
 
+  /// Should we stop running all jobs as soon as one fails?  If false, we run as
+  /// much as we can.
+  bool StopOnJobFailure : 1;
+
   std::list TempFiles;
   std::list ResultFiles;
 
Index: include/clang/Driver/Compilation.h
===
--- include/clang/Driver/Compilation.h
+++ include/clang/Driver/Compilation.h
@@ -193,12 +193,13 @@
   /// \return The result code of the subprocess.
   int ExecuteCommand(const Command , const Command *) const;
 
-  /// ExecuteJob - Execute a single job.
+  /// ExecuteJobs - Execute a list of jobs.
   ///
-  /// \param FailingCommands - For non-zero results, this will be a vector of
-  /// 

Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/Driver/Driver.cpp:650
@@ -638,3 +649,3 @@
   SmallVector, 4> FailingCommands;
-  C.ExecuteJobs(C.getJobs(), FailingCommands);
+  C.ExecuteJobs(C.getJobs(), /* StopOnFailure = */ false, FailingCommands);
 

tra wrote:
> Why is StopOnFailure is false in this case? Shouldn't it obey command line 
> options, too?
This function is called when the compiler has an internal error or crashes.  
The jobs we're executing here are preprocessor jobs dumping debugging info.  I 
figured we should not stop on failure when outputting that info?


http://reviews.llvm.org/D16514



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar marked 3 inline comments as done.
jlebar added a comment.

Thank you for the reviews.

Please have another look; I switched to using a struct proper.  It's a lot 
cleaner!  We're now assuming that the struct is aligned in the same way as 
vprintf wants, but if anything I expect this new code is more likely to match 
what it wants.



Comment at: lib/CodeGen/CGCUDABuiltin.cpp:105-108
@@ -104,2 +104,6 @@
   } else {
-BufferPtr = Builder.Insert(new llvm::AllocaInst(
+// Insert our alloca not into the current BB, but into the function's entry
+// block.  This is important because nvvm doesn't support alloca -- if we
+// put the alloca anywhere else, llvm may eventually output
+// stacksave/stackrestore intrinsics, which cause our nvvm backend to 
choke.
+auto *Alloca = new llvm::AllocaInst(

rnk wrote:
> The fact that allocas for local variables should always go in the entry block 
> is pretty widespread cultural knowledge in LLVM and clang. Most readers 
> aren't going to need this comment, unless you expect that people working on 
> CUDA won't have that background. Plus, if you use CreateTempAlloca, there 
> won't be any question about which insert point should be used.
OK, yeah, I also don't like comments that explain something that everyone other 
than the author knows.  Thanks.


http://reviews.llvm.org/D16664



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar added inline comments.


Comment at: lib/Driver/Driver.cpp:652
@@ -640,3 +651,3 @@
   SmallVector, 4> FailingCommands;
-  C.ExecuteJobs(C.getJobs(), FailingCommands);
+  C.ExecuteJobs(C.getJobs(), /* StopOnFailure = */ false, FailingCommands);
 

tra wrote:
> jlebar wrote:
> > tra wrote:
> > > As far as I can tell, we don't do anything interesting if we've detected 
> > > that *any* of the commands have failed. That suggests that doing anything 
> > > beyond the first failing command does not do us any good. That would 
> > > suggest that we may really want StopOnFailure=true here.
> > > 
> > > 'false' would preserve current behavior, though.
> > > 
> > > In either case I'm OK with a constant here.
> > Sorry, I think I'm misunderstanding something.  Would you mind rephrasing 
> > this?
> > 
> > > As far as I can tell, we don't do anything interesting if we've detected 
> > > that *any* of the commands have failed.  That suggests that doing 
> > > anything beyond the first failing command does not do us any good.
> > 
> > The scenario I thought this change applied to was:
> > 
> > External tool crashes during a call to ExecuteJobs() (not this one).  We 
> > now want to output preprocessed inputs, so we run this code, which again 
> > calls ExecuteJobs(), but these jobs only run the preprocessor on the inputs.
> > 
> > Now suppose one of those preprocessor jobs fails.  Maybe it has a bad 
> > preprocessor directive, or maybe #error would be enough.  It seems to me in 
> > this case that we should continue running the other preprocessor jobs, so 
> > we dump as much debug info as we can.
> > 
> > Note that if the StopOnFailure flag is false, afaict it's entirely possible 
> > for us to have two inputs, one of which has a pp error and the other of 
> > which causes a compiler crash -- if we stopped on failure here, we wouldn't 
> > output anything for the second input, which is the one we're interested in.
> > 
> > Sorry again, I'm sure I'm missing something.
> Look at the lines below. If there are any failing commands we just report an 
> error and return.
> Even if there are multiple preprocessor jobs and if some of them succeed, we 
> would not get to use their output.
> 
Oh.

Thanks.  :)


http://reviews.llvm.org/D16514



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 46314.
jlebar marked an inline comment as done.
jlebar added a comment.

Use a struct rather than an i8 buffer.


http://reviews.llvm.org/D16664

Files:
  lib/CodeGen/CGCUDABuiltin.cpp
  test/CodeGenCUDA/printf.cu

Index: test/CodeGenCUDA/printf.cu
===
--- test/CodeGenCUDA/printf.cu
+++ test/CodeGenCUDA/printf.cu
@@ -9,45 +9,35 @@
 extern "C" __device__ int vprintf(const char*, const char*);
 
 // Check a simple call to printf end-to-end.
+// CHECK: [[SIMPLE_PRINTF_TY:%[a-zA-Z0-9_]+]] = type { i32, i64, double }
 __device__ int CheckSimple() {
+  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca [[SIMPLE_PRINTF_TY]]
   // CHECK: [[FMT:%[0-9]+]] = load{{.*}}%fmt
-  const char* fmt = "%d";
-  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca i8, i32 4, align 4
-  // CHECK: [[PTR:%[0-9]+]] = getelementptr i8, i8* [[BUF]], i32 0
-  // CHECK: [[CAST:%[0-9]+]] = bitcast i8* [[PTR]] to i32*
-  // CHECK: store i32 42, i32* [[CAST]], align 4
-  // CHECK: [[RET:%[0-9]+]] = call i32 @vprintf(i8* [[FMT]], i8* [[BUF]])
+  const char* fmt = "%d %lld %f";
+  // CHECK: [[PTR0:%[0-9]+]] = getelementptr inbounds [[SIMPLE_PRINTF_TY]], [[SIMPLE_PRINTF_TY]]* [[BUF]], i32 0, i32 0
+  // CHECK: store i32 1, i32* [[PTR0]], align 4
+  // CHECK: [[PTR1:%[0-9]+]] = getelementptr inbounds [[SIMPLE_PRINTF_TY]], [[SIMPLE_PRINTF_TY]]* [[BUF]], i32 0, i32 1
+  // CHECK: store i64 2, i64* [[PTR1]], align 8
+  // CHECK: [[PTR2:%[0-9]+]] = getelementptr inbounds [[SIMPLE_PRINTF_TY]], [[SIMPLE_PRINTF_TY]]* [[BUF]], i32 0, i32 2
+  // CHECK: store double 3.0{{[^,]*}}, double* [[PTR2]], align 8
+  // CHECK: [[BUF_CAST:%[0-9]+]] = bitcast [[SIMPLE_PRINTF_TY]]* [[BUF]] to i8*
+  // CHECK: [[RET:%[0-9]+]] = call i32 @vprintf(i8* [[FMT]], i8* [[BUF_CAST]])
   // CHECK: ret i32 [[RET]]
-  return printf(fmt, 42);
-}
-
-// Check that the args' types are promoted correctly when we call printf.
-__device__ void CheckTypes() {
-  // CHECK: alloca {{.*}} align 8
-  // CHECK: getelementptr {{.*}} i32 0
-  // CHECK: bitcast {{.*}} to i32*
-  // CHECK: getelementptr {{.*}} i32 4
-  // CHECK: bitcast {{.*}} to i32*
-  // CHECK: getelementptr {{.*}} i32 8
-  // CHECK: bitcast {{.*}} to double*
-  // CHECK: getelementptr {{.*}} i32 16
-  // CHECK: bitcast {{.*}} to double*
-  printf("%d %d %f %f", (char)1, (short)2, 3.0f, 4.0);
-}
-
-// Check that the args are aligned properly in the buffer.
-__device__ void CheckAlign() {
-  // CHECK: alloca i8, i32 40, align 8
-  // CHECK: getelementptr {{.*}} i32 0
-  // CHECK: getelementptr {{.*}} i32 8
-  // CHECK: getelementptr {{.*}} i32 16
-  // CHECK: getelementptr {{.*}} i32 20
-  // CHECK: getelementptr {{.*}} i32 24
-  // CHECK: getelementptr {{.*}} i32 32
-  printf("%d %f %d %d %d %lld", 1, 2.0, 3, 4, 5, (long long)6);
+  return printf(fmt, 1, 2ll, 3.0);
 }
 
 __device__ void CheckNoArgs() {
   // CHECK: call i32 @vprintf({{.*}}, i8* null){{$}}
   printf("hello, world!");
 }
+
+// Check that printf's alloca happens in the entry block, not inside the if
+// statement.
+__device__ bool foo();
+__device__ void CheckAllocaIsInEntryBlock() {
+  // CHECK: alloca %printf_args
+  // CHECK: call {{.*}} @_Z3foov()
+  if (foo()) {
+printf("%d", 42);
+  }
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -52,10 +52,13 @@
 //
 // is converted into something resembling
 //
-//   char* buf = alloca(...);
-//   *reinterpret_cast(buf) = arg1;
-//   *reinterpret_cast(buf + ...) = arg2;
-//   *reinterpret_cast(buf + ...) = arg3;
+//   struct Tmp {
+// Arg1 a1;
+// Arg2 a2;
+// Arg3 a3;
+//   };
+//   char* buf = alloca(sizeof(Tmp));
+//   *(Tmp*)buf = {a1, a2, a3};
 //   vprintf("format string", buf);
 //
 // buf is aligned to the max of {alignof(Arg1), ...}.  Furthermore, each of the
@@ -80,48 +83,24 @@
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
-  // Figure out how large of a buffer we need to hold our varargs and how
-  // aligned the buffer needs to be.  We start iterating at Arg[1], because
-  // that's our first vararg.
-  unsigned BufSize = 0;
-  unsigned BufAlign = 0;
-  for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
-const RValue& RV = Args[I].RV;
-llvm::Type* Ty = RV.getScalarVal()->getType();
-
-auto Align = DL.getPrefTypeAlignment(Ty);
-BufAlign = std::max(BufAlign, Align);
-// Add padding required to keep the current arg aligned.
-BufSize = llvm::alignTo(BufSize, Align);
-BufSize += DL.getTypeAllocSize(Ty);
-  }
-
-  // Construct and fill the buffer.
-  llvm::Value* BufferPtr = nullptr;
-  if (BufSize == 0) {
+  // Construct and fill the args buffer that we'll pass to vprintf.
+  llvm::Value* BufferPtr;
+  if (Args.size() <= 1) {
 // If there are no args, pass a null 

Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 46315.
jlebar marked 3 inline comments as done.
jlebar added a comment.

Pass StopOnFailure = true when running the preprocessor after an ICE.


http://reviews.llvm.org/D16514

Files:
  include/clang/Driver/Compilation.h
  include/clang/Driver/Driver.h
  include/clang/Driver/Options.td
  lib/Driver/Compilation.cpp
  lib/Driver/Driver.cpp

Index: lib/Driver/Driver.cpp
===
--- lib/Driver/Driver.cpp
+++ lib/Driver/Driver.cpp
@@ -58,7 +58,8 @@
   CCPrintHeadersFilename(nullptr), CCLogDiagnosticsFilename(nullptr),
   CCCPrintBindings(false), CCPrintHeaders(false), CCLogDiagnostics(false),
   CCGenDiagnostics(false), CCCGenericGCCName(""), CheckInputsExist(true),
-  CCCUsePCH(true), SuppressMissingInputWarning(false) {
+  CCCUsePCH(true), SuppressMissingInputWarning(false),
+  StopOnJobFailure(false) {
 
   // Provide a sane fallback if no VFS is specified.
   if (!this->VFS)
@@ -505,6 +506,16 @@
   InputList Inputs;
   BuildInputs(C->getDefaultToolChain(), *TranslatedArgs, Inputs);
 
+  // StopOnJobFailure defaults to false, except for CUDA compilations.
+  if (Arg *A = C->getArgs().getLastArg(options::OPT_stop_on_failure,
+   options::OPT_no_stop_on_failure))
+StopOnJobFailure = A->getOption().matches(options::OPT_stop_on_failure);
+  else
+StopOnJobFailure =
+llvm::any_of(Inputs, [](const std::pair ) {
+  return I.first == types::TY_CUDA;
+});
+
   // Construct the list of abstract actions to perform for this compilation. On
   // MachO targets this uses the driver-driver and universal actions.
   if (TC.getTriple().isOSBinFormatMachO())
@@ -638,7 +649,9 @@
 
   // Generate preprocessed output.
   SmallVector, 4> FailingCommands;
-  C.ExecuteJobs(C.getJobs(), FailingCommands);
+  // Might as well pass StopOnFailure = true; if any of the commands fails, we
+  // don't output anything at all.
+  C.ExecuteJobs(C.getJobs(), /* StopOnFailure = */ true, FailingCommands);
 
   // If any of the preprocessing commands failed, clean up and exit.
   if (!FailingCommands.empty()) {
@@ -730,7 +743,7 @@
   for (auto  : C.getJobs())
 setUpResponseFiles(C, Job);
 
-  C.ExecuteJobs(C.getJobs(), FailingCommands);
+  C.ExecuteJobs(C.getJobs(), StopOnJobFailure, FailingCommands);
 
   // Remove temp files.
   C.CleanupFileList(C.getTempFiles());
Index: lib/Driver/Compilation.cpp
===
--- lib/Driver/Compilation.cpp
+++ lib/Driver/Compilation.cpp
@@ -188,14 +188,17 @@
   return !ActionFailed((), FailingCommands);
 }
 
-void Compilation::ExecuteJobs(const JobList ,
+void Compilation::ExecuteJobs(const JobList , bool StopOnFailure,
   FailingCommandList ) const {
   for (const auto  : Jobs) {
 if (!InputsOk(Job, FailingCommands))
   continue;
 const Command *FailingCommand = nullptr;
-if (int Res = ExecuteCommand(Job, FailingCommand))
+if (int Res = ExecuteCommand(Job, FailingCommand)) {
   FailingCommands.push_back(std::make_pair(Res, FailingCommand));
+  if (StopOnFailure)
+return;
+}
   }
 }
 
Index: include/clang/Driver/Options.td
===
--- include/clang/Driver/Options.td
+++ include/clang/Driver/Options.td
@@ -1801,6 +1801,11 @@
 def : Flag<["-"], "no-integrated-as">, Alias,
   Flags<[CC1Option, DriverOption]>;
 
+def stop_on_failure : Flag<["-"], "stop-on-failure">, Flags<[DriverOption]>,
+  HelpText<"Stop running jobs as soon as one fails.  This is the default during "
+"CUDA compilation without --save-temps.">;
+def no_stop_on_failure : Flag<["-"], "no-stop-on-failure">, Flags<[DriverOption]>;
+
 def working_directory : JoinedOrSeparate<["-"], "working-directory">, Flags<[CC1Option]>,
   HelpText<"Resolve file paths relative to the specified directory">;
 def working_directory_EQ : Joined<["-"], "working-directory=">, Flags<[CC1Option]>,
Index: include/clang/Driver/Driver.h
===
--- include/clang/Driver/Driver.h
+++ include/clang/Driver/Driver.h
@@ -192,6 +192,10 @@
   /// Certain options suppress the 'no input files' warning.
   bool SuppressMissingInputWarning : 1;
 
+  /// Should we stop running all jobs as soon as one fails?  If false, we run as
+  /// much as we can.
+  bool StopOnJobFailure : 1;
+
   std::list TempFiles;
   std::list ResultFiles;
 
Index: include/clang/Driver/Compilation.h
===
--- include/clang/Driver/Compilation.h
+++ include/clang/Driver/Compilation.h
@@ -193,12 +193,13 @@
   /// \return The result code of the subprocess.
   int ExecuteCommand(const Command , const Command *) const;
 
-  /// ExecuteJob - Execute a single job.
+  

Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar added a comment.

In http://reviews.llvm.org/D16514#338631, @echristo wrote:

> In general it feels like keeping 2 errors might make the most sense:
>
> #if _NOT_ARCH4_
>  #error "aiee!"
>  #endif
>
> clang -arch arch1 -arch arch2 -arch arch3 -arch arch4 t.c
>
> seems like it might be nice to get 3 errors here rather than a single one and 
> fixing that single one, then getting another one, etc. or realizing what the 
> error is here.


Yes, this patch makes that case worse.

But I suspect errors that apply to some but not all archs will be far less 
common than errors that apply to all arches -- regular C++ errors like missing 
a semicolon or whatever.  It feels pretty overwhelming to output N copies of 
every error in those cases, especially when you consider multipage template 
errors.

In addition, iirc there's no separation between errors outputted for different 
archs, so it really looks like we're just outputting multiple copies of the 
errors for fun.

> I don't feel strongly about this, but I'm still uncertain as to why we want 
> to make things more complicated here :)


The other reason, which is less important, is that when you have one arch and 
ptxas fails -- which, it shouldn't, but we're not good enough to catch 
everything yet, and likely won't be for some time -- the error you get is

  ptxas: foo is not defined
  *FATAL ERROR*: fatbinary failed, /tmp/bar.cubin does not exist.

I'd like not to display that second line, since it hides the actual problem.  
Once you get used to it, it's not a big deal, but it tripped me up for a few 
minutes, and I'm the one who added the call to ptxas.



Comment at: lib/Driver/Driver.cpp:652
@@ -640,3 +651,3 @@
   SmallVector, 4> FailingCommands;
-  C.ExecuteJobs(C.getJobs(), FailingCommands);
+  C.ExecuteJobs(C.getJobs(), /* StopOnFailure = */ false, FailingCommands);
 

tra wrote:
> As far as I can tell, we don't do anything interesting if we've detected that 
> *any* of the commands have failed. That suggests that doing anything beyond 
> the first failing command does not do us any good. That would suggest that we 
> may really want StopOnFailure=true here.
> 
> 'false' would preserve current behavior, though.
> 
> In either case I'm OK with a constant here.
Sorry, I think I'm misunderstanding something.  Would you mind rephrasing this?

> As far as I can tell, we don't do anything interesting if we've detected that 
> *any* of the commands have failed.  That suggests that doing anything beyond 
> the first failing command does not do us any good.

The scenario I thought this change applied to was:

External tool crashes during a call to ExecuteJobs() (not this one).  We now 
want to output preprocessed inputs, so we run this code, which again calls 
ExecuteJobs(), but these jobs only run the preprocessor on the inputs.

Now suppose one of those preprocessor jobs fails.  Maybe it has a bad 
preprocessor directive, or maybe #error would be enough.  It seems to me in 
this case that we should continue running the other preprocessor jobs, so we 
dump as much debug info as we can.

Note that if the StopOnFailure flag is false, afaict it's entirely possible for 
us to have two inputs, one of which has a pp error and the other of which 
causes a compiler crash -- if we stopped on failure here, we wouldn't output 
anything for the second input, which is the one we're interested in.

Sorry again, I'm sure I'm missing something.


http://reviews.llvm.org/D16514



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
Hm, well, 
https://llvm.org/svn/llvm-project/cfe/trunk/tools/clang-format/git-clang-format
is close...  Not sure if that triggers the bff clause, will consult my
attorney.

On Thu, Jan 28, 2016 at 4:09 PM, Justin Lebar  wrote:
> Do you have a script that will take as input a commit range and git
> commit --amend clang-tidy fixes for lines modified in that range?
> Because if so,
>
> a) I would be your best friend forever, and
> b) It should be simple to convert that into a linter for arc to catch
> the case when I forget to run said tool.
>
> On Thu, Jan 28, 2016 at 4:06 PM, Eric Christopher  wrote:
>> echristo added inline comments.
>>
>> 
>> Comment at: lib/CodeGen/CGCUDABuiltin.cpp:87
>> @@ +86,3 @@
>> +  // Construct and fill the args buffer that we'll pass to vprintf.
>> +  llvm::Value* BufferPtr;
>> +  if (Args.size() <= 1) {
>> 
>> jlebar wrote:
>>> echristo wrote:
>>> > * on the wrong side ;)
>>> Argh, I really need to set up a linter.  I'm still doing readability 
>>> reviews, and I cannot brain two styles.  Sorry to keep wasting your time 
>>> with silly stuff like this.
>> You could just use clang-format on everything :)
>>
>>
>> Repository:
>>   rL LLVM
>>
>> http://reviews.llvm.org/D16664
>>
>>
>>
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r259122 - [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Thu Jan 28 17:58:28 2016
New Revision: 259122

URL: http://llvm.org/viewvc/llvm-project?rev=259122=rev
Log:
[CUDA] Generate CUDA's printf alloca in its function's entry block.

Summary:
This is necessary to prevent llvm from generating stacksave intrinsics
around this alloca.  NVVM doesn't have a stack, and we don't handle said
intrinsics.

Reviewers: rnk, echristo

Subscribers: cfe-commits, jhen, tra

Differential Revision: http://reviews.llvm.org/D16664

Modified:
cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
cfe/trunk/test/CodeGenCUDA/printf.cu

Modified: cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp?rev=259122=259121=259122=diff
==
--- cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp Thu Jan 28 17:58:28 2016
@@ -52,10 +52,13 @@ static llvm::Function *GetVprintfDeclara
 //
 // is converted into something resembling
 //
-//   char* buf = alloca(...);
-//   *reinterpret_cast(buf) = arg1;
-//   *reinterpret_cast(buf + ...) = arg2;
-//   *reinterpret_cast(buf + ...) = arg3;
+//   struct Tmp {
+// Arg1 a1;
+// Arg2 a2;
+// Arg3 a3;
+//   };
+//   char* buf = alloca(sizeof(Tmp));
+//   *(Tmp*)buf = {a1, a2, a3};
 //   vprintf("format string", buf);
 //
 // buf is aligned to the max of {alignof(Arg1), ...}.  Furthermore, each of the
@@ -80,48 +83,24 @@ CodeGenFunction::EmitCUDADevicePrintfCal
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
-  // Figure out how large of a buffer we need to hold our varargs and how
-  // aligned the buffer needs to be.  We start iterating at Arg[1], because
-  // that's our first vararg.
-  unsigned BufSize = 0;
-  unsigned BufAlign = 0;
-  for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
-const RValue& RV = Args[I].RV;
-llvm::Type* Ty = RV.getScalarVal()->getType();
-
-auto Align = DL.getPrefTypeAlignment(Ty);
-BufAlign = std::max(BufAlign, Align);
-// Add padding required to keep the current arg aligned.
-BufSize = llvm::alignTo(BufSize, Align);
-BufSize += DL.getTypeAllocSize(Ty);
-  }
-
-  // Construct and fill the buffer.
-  llvm::Value* BufferPtr = nullptr;
-  if (BufSize == 0) {
+  // Construct and fill the args buffer that we'll pass to vprintf.
+  llvm::Value *BufferPtr;
+  if (Args.size() <= 1) {
 // If there are no args, pass a null pointer to vprintf.
 BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));
   } else {
-BufferPtr = Builder.Insert(new llvm::AllocaInst(
-llvm::Type::getInt8Ty(Ctx), llvm::ConstantInt::get(Int32Ty, BufSize),
-BufAlign, "printf_arg_buf"));
+llvm::SmallVector ArgTypes;
+for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I)
+  ArgTypes.push_back(Args[I].RV.getScalarVal()->getType());
+llvm::Type *AllocaTy = llvm::StructType::create(ArgTypes, "printf_args");
+llvm::Value *Alloca = CreateTempAlloca(AllocaTy);
 
-unsigned Offset = 0;
 for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
+  llvm::Value *P = Builder.CreateStructGEP(AllocaTy, Alloca, I - 1);
   llvm::Value *Arg = Args[I].RV.getScalarVal();
-  llvm::Type *Ty = Arg->getType();
-  auto Align = DL.getPrefTypeAlignment(Ty);
-
-  // Pad the buffer to Arg's alignment.
-  Offset = llvm::alignTo(Offset, Align);
-
-  // Store Arg into the buffer at Offset.
-  llvm::Value *GEP =
-  Builder.CreateGEP(BufferPtr, llvm::ConstantInt::get(Int32Ty, 
Offset));
-  llvm::Value *Cast = Builder.CreateBitCast(GEP, Ty->getPointerTo());
-  Builder.CreateAlignedStore(Arg, Cast, Align);
-  Offset += DL.getTypeAllocSize(Ty);
+  Builder.CreateAlignedStore(Arg, P, 
DL.getPrefTypeAlignment(Arg->getType()));
 }
+BufferPtr = Builder.CreatePointerCast(Alloca, 
llvm::Type::getInt8PtrTy(Ctx));
   }
 
   // Invoke vprintf and return.

Modified: cfe/trunk/test/CodeGenCUDA/printf.cu
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/printf.cu?rev=259122=259121=259122=diff
==
--- cfe/trunk/test/CodeGenCUDA/printf.cu (original)
+++ cfe/trunk/test/CodeGenCUDA/printf.cu Thu Jan 28 17:58:28 2016
@@ -9,45 +9,35 @@
 extern "C" __device__ int vprintf(const char*, const char*);
 
 // Check a simple call to printf end-to-end.
+// CHECK: [[SIMPLE_PRINTF_TY:%[a-zA-Z0-9_]+]] = type { i32, i64, double }
 __device__ int CheckSimple() {
+  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca [[SIMPLE_PRINTF_TY]]
   // CHECK: [[FMT:%[0-9]+]] = load{{.*}}%fmt
-  const char* fmt = "%d";
-  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca i8, i32 4, align 4
-  // CHECK: [[PTR:%[0-9]+]] = getelementptr i8, i8* [[BUF]], i32 0
-  // CHECK: [[CAST:%[0-9]+]] = bitcast i8* 

Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar marked an inline comment as done.


Comment at: lib/CodeGen/CGCUDABuiltin.cpp:87
@@ +86,3 @@
+  // Construct and fill the args buffer that we'll pass to vprintf.
+  llvm::Value* BufferPtr;
+  if (Args.size() <= 1) {

echristo wrote:
> * on the wrong side ;)
Argh, I really need to set up a linter.  I'm still doing readability reviews, 
and I cannot brain two styles.  Sorry to keep wasting your time with silly 
stuff like this.


http://reviews.llvm.org/D16664



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
This revision was automatically updated to reflect the committed changes.
jlebar marked an inline comment as done.
Closed by commit rL259122: [CUDA] Generate CUDA's printf alloca in its 
function's entry block. (authored by jlebar).

Changed prior to commit:
  http://reviews.llvm.org/D16664?vs=46314=46323#toc

Repository:
  rL LLVM

http://reviews.llvm.org/D16664

Files:
  cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
  cfe/trunk/test/CodeGenCUDA/printf.cu

Index: cfe/trunk/test/CodeGenCUDA/printf.cu
===
--- cfe/trunk/test/CodeGenCUDA/printf.cu
+++ cfe/trunk/test/CodeGenCUDA/printf.cu
@@ -9,45 +9,35 @@
 extern "C" __device__ int vprintf(const char*, const char*);
 
 // Check a simple call to printf end-to-end.
+// CHECK: [[SIMPLE_PRINTF_TY:%[a-zA-Z0-9_]+]] = type { i32, i64, double }
 __device__ int CheckSimple() {
+  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca [[SIMPLE_PRINTF_TY]]
   // CHECK: [[FMT:%[0-9]+]] = load{{.*}}%fmt
-  const char* fmt = "%d";
-  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca i8, i32 4, align 4
-  // CHECK: [[PTR:%[0-9]+]] = getelementptr i8, i8* [[BUF]], i32 0
-  // CHECK: [[CAST:%[0-9]+]] = bitcast i8* [[PTR]] to i32*
-  // CHECK: store i32 42, i32* [[CAST]], align 4
-  // CHECK: [[RET:%[0-9]+]] = call i32 @vprintf(i8* [[FMT]], i8* [[BUF]])
+  const char* fmt = "%d %lld %f";
+  // CHECK: [[PTR0:%[0-9]+]] = getelementptr inbounds [[SIMPLE_PRINTF_TY]], [[SIMPLE_PRINTF_TY]]* [[BUF]], i32 0, i32 0
+  // CHECK: store i32 1, i32* [[PTR0]], align 4
+  // CHECK: [[PTR1:%[0-9]+]] = getelementptr inbounds [[SIMPLE_PRINTF_TY]], [[SIMPLE_PRINTF_TY]]* [[BUF]], i32 0, i32 1
+  // CHECK: store i64 2, i64* [[PTR1]], align 8
+  // CHECK: [[PTR2:%[0-9]+]] = getelementptr inbounds [[SIMPLE_PRINTF_TY]], [[SIMPLE_PRINTF_TY]]* [[BUF]], i32 0, i32 2
+  // CHECK: store double 3.0{{[^,]*}}, double* [[PTR2]], align 8
+  // CHECK: [[BUF_CAST:%[0-9]+]] = bitcast [[SIMPLE_PRINTF_TY]]* [[BUF]] to i8*
+  // CHECK: [[RET:%[0-9]+]] = call i32 @vprintf(i8* [[FMT]], i8* [[BUF_CAST]])
   // CHECK: ret i32 [[RET]]
-  return printf(fmt, 42);
-}
-
-// Check that the args' types are promoted correctly when we call printf.
-__device__ void CheckTypes() {
-  // CHECK: alloca {{.*}} align 8
-  // CHECK: getelementptr {{.*}} i32 0
-  // CHECK: bitcast {{.*}} to i32*
-  // CHECK: getelementptr {{.*}} i32 4
-  // CHECK: bitcast {{.*}} to i32*
-  // CHECK: getelementptr {{.*}} i32 8
-  // CHECK: bitcast {{.*}} to double*
-  // CHECK: getelementptr {{.*}} i32 16
-  // CHECK: bitcast {{.*}} to double*
-  printf("%d %d %f %f", (char)1, (short)2, 3.0f, 4.0);
-}
-
-// Check that the args are aligned properly in the buffer.
-__device__ void CheckAlign() {
-  // CHECK: alloca i8, i32 40, align 8
-  // CHECK: getelementptr {{.*}} i32 0
-  // CHECK: getelementptr {{.*}} i32 8
-  // CHECK: getelementptr {{.*}} i32 16
-  // CHECK: getelementptr {{.*}} i32 20
-  // CHECK: getelementptr {{.*}} i32 24
-  // CHECK: getelementptr {{.*}} i32 32
-  printf("%d %f %d %d %d %lld", 1, 2.0, 3, 4, 5, (long long)6);
+  return printf(fmt, 1, 2ll, 3.0);
 }
 
 __device__ void CheckNoArgs() {
   // CHECK: call i32 @vprintf({{.*}}, i8* null){{$}}
   printf("hello, world!");
 }
+
+// Check that printf's alloca happens in the entry block, not inside the if
+// statement.
+__device__ bool foo();
+__device__ void CheckAllocaIsInEntryBlock() {
+  // CHECK: alloca %printf_args
+  // CHECK: call {{.*}} @_Z3foov()
+  if (foo()) {
+printf("%d", 42);
+  }
+}
Index: cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
===
--- cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
+++ cfe/trunk/lib/CodeGen/CGCUDABuiltin.cpp
@@ -52,10 +52,13 @@
 //
 // is converted into something resembling
 //
-//   char* buf = alloca(...);
-//   *reinterpret_cast(buf) = arg1;
-//   *reinterpret_cast(buf + ...) = arg2;
-//   *reinterpret_cast(buf + ...) = arg3;
+//   struct Tmp {
+// Arg1 a1;
+// Arg2 a2;
+// Arg3 a3;
+//   };
+//   char* buf = alloca(sizeof(Tmp));
+//   *(Tmp*)buf = {a1, a2, a3};
 //   vprintf("format string", buf);
 //
 // buf is aligned to the max of {alignof(Arg1), ...}.  Furthermore, each of the
@@ -80,48 +83,24 @@
E->arguments(), E->getDirectCallee(),
/* ParamsToSkip = */ 0);
 
-  // Figure out how large of a buffer we need to hold our varargs and how
-  // aligned the buffer needs to be.  We start iterating at Arg[1], because
-  // that's our first vararg.
-  unsigned BufSize = 0;
-  unsigned BufAlign = 0;
-  for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
-const RValue& RV = Args[I].RV;
-llvm::Type* Ty = RV.getScalarVal()->getType();
-
-auto Align = DL.getPrefTypeAlignment(Ty);
-BufAlign = std::max(BufAlign, Align);
-// Add padding required to keep the current arg aligned.
-BufSize = llvm::alignTo(BufSize, Align);
-BufSize += DL.getTypeAllocSize(Ty);
-  }
-
-  

Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
Do you have a script that will take as input a commit range and git
commit --amend clang-tidy fixes for lines modified in that range?
Because if so,

a) I would be your best friend forever, and
b) It should be simple to convert that into a linter for arc to catch
the case when I forget to run said tool.

On Thu, Jan 28, 2016 at 4:06 PM, Eric Christopher  wrote:
> echristo added inline comments.
>
> 
> Comment at: lib/CodeGen/CGCUDABuiltin.cpp:87
> @@ +86,3 @@
> +  // Construct and fill the args buffer that we'll pass to vprintf.
> +  llvm::Value* BufferPtr;
> +  if (Args.size() <= 1) {
> 
> jlebar wrote:
>> echristo wrote:
>> > * on the wrong side ;)
>> Argh, I really need to set up a linter.  I'm still doing readability 
>> reviews, and I cannot brain two styles.  Sorry to keep wasting your time 
>> with silly stuff like this.
> You could just use clang-format on everything :)
>
>
> Repository:
>   rL LLVM
>
> http://reviews.llvm.org/D16664
>
>
>
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-28 Thread Justin Lebar via cfe-commits
jlebar updated this revision to Diff 46293.
jlebar added a comment.

Address echristo's review comments.


http://reviews.llvm.org/D16664

Files:
  lib/CodeGen/CGCUDABuiltin.cpp
  test/CodeGenCUDA/printf.cu

Index: test/CodeGenCUDA/printf.cu
===
--- test/CodeGenCUDA/printf.cu
+++ test/CodeGenCUDA/printf.cu
@@ -10,9 +10,9 @@
 
 // Check a simple call to printf end-to-end.
 __device__ int CheckSimple() {
+  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca i8, i32 4, align 4
   // CHECK: [[FMT:%[0-9]+]] = load{{.*}}%fmt
   const char* fmt = "%d";
-  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca i8, i32 4, align 4
   // CHECK: [[PTR:%[0-9]+]] = getelementptr i8, i8* [[BUF]], i32 0
   // CHECK: [[CAST:%[0-9]+]] = bitcast i8* [[PTR]] to i32*
   // CHECK: store i32 42, i32* [[CAST]], align 4
@@ -51,3 +51,14 @@
   // CHECK: call i32 @vprintf({{.*}}, i8* null){{$}}
   printf("hello, world!");
 }
+
+// Check that printf's alloca happens in the entry block, not inside the if
+// statement.
+__device__ bool foo();
+__device__ void CheckAllocaIsInEntryBlock() {
+  // CHECK: alloca i8, i32 4, align 4
+  // CHECK: call {{.*}} @_Z3foov()
+  if (foo()) {
+printf("%d", 42);
+  }
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -102,9 +102,15 @@
 // If there are no args, pass a null pointer to vprintf.
 BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));
   } else {
-BufferPtr = Builder.Insert(new llvm::AllocaInst(
+// Insert our alloca not into the current BB, but into the function's entry
+// block.  This is important because nvvm doesn't support alloca -- if we
+// put the alloca anywhere else, llvm may eventually output
+// stacksave/stackrestore intrinsics, which cause our nvvm backend to 
choke.
+auto *Alloca = new llvm::AllocaInst(
 llvm::Type::getInt8Ty(Ctx), llvm::ConstantInt::get(Int32Ty, BufSize),
-BufAlign, "printf_arg_buf"));
+BufAlign, "printf_arg_buf");
+Alloca->insertAfter(AllocaInsertPt);
+BufferPtr = Alloca;
 
 unsigned Offset = 0;
 for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {


Index: test/CodeGenCUDA/printf.cu
===
--- test/CodeGenCUDA/printf.cu
+++ test/CodeGenCUDA/printf.cu
@@ -10,9 +10,9 @@
 
 // Check a simple call to printf end-to-end.
 __device__ int CheckSimple() {
+  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca i8, i32 4, align 4
   // CHECK: [[FMT:%[0-9]+]] = load{{.*}}%fmt
   const char* fmt = "%d";
-  // CHECK: [[BUF:%[a-zA-Z0-9_]+]] = alloca i8, i32 4, align 4
   // CHECK: [[PTR:%[0-9]+]] = getelementptr i8, i8* [[BUF]], i32 0
   // CHECK: [[CAST:%[0-9]+]] = bitcast i8* [[PTR]] to i32*
   // CHECK: store i32 42, i32* [[CAST]], align 4
@@ -51,3 +51,14 @@
   // CHECK: call i32 @vprintf({{.*}}, i8* null){{$}}
   printf("hello, world!");
 }
+
+// Check that printf's alloca happens in the entry block, not inside the if
+// statement.
+__device__ bool foo();
+__device__ void CheckAllocaIsInEntryBlock() {
+  // CHECK: alloca i8, i32 4, align 4
+  // CHECK: call {{.*}} @_Z3foov()
+  if (foo()) {
+printf("%d", 42);
+  }
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -102,9 +102,15 @@
 // If there are no args, pass a null pointer to vprintf.
 BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));
   } else {
-BufferPtr = Builder.Insert(new llvm::AllocaInst(
+// Insert our alloca not into the current BB, but into the function's entry
+// block.  This is important because nvvm doesn't support alloca -- if we
+// put the alloca anywhere else, llvm may eventually output
+// stacksave/stackrestore intrinsics, which cause our nvvm backend to choke.
+auto *Alloca = new llvm::AllocaInst(
 llvm::Type::getInt8Ty(Ctx), llvm::ConstantInt::get(Int32Ty, BufSize),
-BufAlign, "printf_arg_buf"));
+BufAlign, "printf_arg_buf");
+Alloca->insertAfter(AllocaInsertPt);
+BufferPtr = Alloca;
 
 unsigned Offset = 0;
 for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16514: Add -stop-on-failure driver option, and enable it by default for CUDA compiles.

2016-01-27 Thread Justin Lebar via cfe-commits
jlebar added a comment.

Friendly ping.


http://reviews.llvm.org/D16514



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D16664: [CUDA] Generate CUDA's printf alloca in its function's entry block.

2016-01-27 Thread Justin Lebar via cfe-commits
jlebar created this revision.
jlebar added a reviewer: rnk.
jlebar added subscribers: tra, echristo, jhen, cfe-commits.

This is necessary to prevent llvm from generating stacksave intrinsics
around this alloca.  NVVM doesn't have a stack, and we don't handle said
intrinsics.

I'm not sure if appending the alloca to the beginning of the entry block is
right.  Adding it to the end would make more sense to me, but then I'm not sure
how to ensure I'm not clobbering the terminator (except by always assuming the
BB is nonempty and inserting right before BB.back()?).

http://reviews.llvm.org/D16664

Files:
  lib/CodeGen/CGCUDABuiltin.cpp
  test/CodeGenCUDA/printf.cu

Index: test/CodeGenCUDA/printf.cu
===
--- test/CodeGenCUDA/printf.cu
+++ test/CodeGenCUDA/printf.cu
@@ -51,3 +51,14 @@
   // CHECK: call i32 @vprintf({{.*}}, i8* null){{$}}
   printf("hello, world!");
 }
+
+// Check that printf's alloca happens in the entry block, not inside the if
+// statement.
+__device__ bool foo();
+__device__ void CheckAllocaIsInEntryBlock() {
+  // CHECK: alloca i8, i32 4 align 4
+  // CHECK: call @_Z3foov()
+  if (foo()) {
+printf("%d", 42);
+  }
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -102,9 +102,15 @@
 // If there are no args, pass a null pointer to vprintf.
 BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));
   } else {
-BufferPtr = Builder.Insert(new llvm::AllocaInst(
+// Insert our alloca not into the current BB, but into the function's entry
+// block.  This is important because nvvm doesn't support alloca -- if we
+// put the alloca anywhere else, llvm may eventually output
+// stacksave/stackrestore intrinsics, which cause ptxas to choke.
+auto *Alloca = new llvm::AllocaInst(
 llvm::Type::getInt8Ty(Ctx), llvm::ConstantInt::get(Int32Ty, BufSize),
-BufAlign, "printf_arg_buf"));
+BufAlign, "printf_arg_buf");
+CurFn->getEntryBlock().getInstList().push_front(Alloca);
+BufferPtr = Alloca;
 
 unsigned Offset = 0;
 for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {


Index: test/CodeGenCUDA/printf.cu
===
--- test/CodeGenCUDA/printf.cu
+++ test/CodeGenCUDA/printf.cu
@@ -51,3 +51,14 @@
   // CHECK: call i32 @vprintf({{.*}}, i8* null){{$}}
   printf("hello, world!");
 }
+
+// Check that printf's alloca happens in the entry block, not inside the if
+// statement.
+__device__ bool foo();
+__device__ void CheckAllocaIsInEntryBlock() {
+  // CHECK: alloca i8, i32 4 align 4
+  // CHECK: call @_Z3foov()
+  if (foo()) {
+printf("%d", 42);
+  }
+}
Index: lib/CodeGen/CGCUDABuiltin.cpp
===
--- lib/CodeGen/CGCUDABuiltin.cpp
+++ lib/CodeGen/CGCUDABuiltin.cpp
@@ -102,9 +102,15 @@
 // If there are no args, pass a null pointer to vprintf.
 BufferPtr = llvm::ConstantPointerNull::get(llvm::Type::getInt8PtrTy(Ctx));
   } else {
-BufferPtr = Builder.Insert(new llvm::AllocaInst(
+// Insert our alloca not into the current BB, but into the function's entry
+// block.  This is important because nvvm doesn't support alloca -- if we
+// put the alloca anywhere else, llvm may eventually output
+// stacksave/stackrestore intrinsics, which cause ptxas to choke.
+auto *Alloca = new llvm::AllocaInst(
 llvm::Type::getInt8Ty(Ctx), llvm::ConstantInt::get(Int32Ty, BufSize),
-BufAlign, "printf_arg_buf"));
+BufAlign, "printf_arg_buf");
+CurFn->getEntryBlock().getInstList().push_front(Alloca);
+BufferPtr = Alloca;
 
 unsigned Offset = 0;
 for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


Re: [PATCH] D16638: [CUDA] Added device-side std::{malloc/free}

2016-01-27 Thread Justin Lebar via cfe-commits
jlebar accepted this revision.
This revision is now accepted and ready to land.


Comment at: lib/Headers/__clang_cuda_cmath.h:222
@@ +221,3 @@
+__DEVICE__ void free(void *__ptr) { return ::free(__ptr); }
+__DEVICE__ void *malloc(size_t __size) { return ::malloc(__size); }
+

Not really math stuff; maybe they should live in the main header?


Comment at: lib/Headers/__clang_cuda_runtime_wrapper.h:172
@@ -171,1 +171,3 @@
 
+// We also need extern "C" decls for device-side allocator functions.
+extern "C" __device__ void free(void *__ptr);

Perhaps add a comment explaining where these are implemented?


http://reviews.llvm.org/D16638



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


r258822 - [CUDA] Add -fcuda-allow-variadic-functions.

2016-01-26 Thread Justin Lebar via cfe-commits
Author: jlebar
Date: Tue Jan 26 11:47:20 2016
New Revision: 258822

URL: http://llvm.org/viewvc/llvm-project?rev=258822=rev
Log:
[CUDA] Add -fcuda-allow-variadic-functions.

Summary:
Turns out the variadic function checking added in r258643 was too strict
for some existing users; give them an escape valve.  When
-fcuda-allow-variadic-functions is passed, the front-end makes no
attempt to disallow C-style variadic functions.  Calls to va_arg are
still not allowed.

Reviewers: tra

Subscribers: cfe-commits, jhen, echristo, bkramer

Differential Revision: http://reviews.llvm.org/D16559

Modified:
cfe/trunk/include/clang/Basic/LangOptions.def
cfe/trunk/include/clang/Driver/CC1Options.td
cfe/trunk/lib/Frontend/CompilerInvocation.cpp
cfe/trunk/lib/Sema/SemaDecl.cpp
cfe/trunk/test/SemaCUDA/vararg.cu

Modified: cfe/trunk/include/clang/Basic/LangOptions.def
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Basic/LangOptions.def?rev=258822=258821=258822=diff
==
--- cfe/trunk/include/clang/Basic/LangOptions.def (original)
+++ cfe/trunk/include/clang/Basic/LangOptions.def Tue Jan 26 11:47:20 2016
@@ -171,6 +171,7 @@ LANGOPT(CUDAIsDevice  , 1, 0, "Compi
 LANGOPT(CUDAAllowHostCallsFromHostDevice, 1, 0, "Allow host device functions 
to call host functions")
 LANGOPT(CUDADisableTargetCallChecks, 1, 0, "Disable checks for call targets 
(host, device, etc.)")
 LANGOPT(CUDATargetOverloads, 1, 0, "Enable function overloads based on CUDA 
target attributes")
+LANGOPT(CUDAAllowVariadicFunctions, 1, 0, "Allow variadic functions in CUDA 
device code")
 
 LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for 
C++'s new operators")
 LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")

Modified: cfe/trunk/include/clang/Driver/CC1Options.td
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/CC1Options.td?rev=258822=258821=258822=diff
==
--- cfe/trunk/include/clang/Driver/CC1Options.td (original)
+++ cfe/trunk/include/clang/Driver/CC1Options.td Tue Jan 26 11:47:20 2016
@@ -678,6 +678,8 @@ def fcuda_include_gpubinary : Separate<[
   HelpText<"Incorporate CUDA device-side binary into host object file.">;
 def fcuda_target_overloads : Flag<["-"], "fcuda-target-overloads">,
   HelpText<"Enable function overloads based on CUDA target attributes.">;
+def fcuda_allow_variadic_functions : Flag<["-"], 
"fcuda-allow-variadic-functions">,
+  HelpText<"Allow variadic functions in CUDA device code.">;
 
 
//===--===//
 // OpenMP Options

Modified: cfe/trunk/lib/Frontend/CompilerInvocation.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInvocation.cpp?rev=258822=258821=258822=diff
==
--- cfe/trunk/lib/Frontend/CompilerInvocation.cpp (original)
+++ cfe/trunk/lib/Frontend/CompilerInvocation.cpp Tue Jan 26 11:47:20 2016
@@ -1521,6 +1521,9 @@ static void ParseLangArgs(LangOptions 
   if (Args.hasArg(OPT_fcuda_target_overloads))
 Opts.CUDATargetOverloads = 1;
 
+  if (Args.hasArg(OPT_fcuda_allow_variadic_functions))
+Opts.CUDAAllowVariadicFunctions = 1;
+
   if (Opts.ObjC1) {
 if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {
   StringRef value = arg->getValue();

Modified: cfe/trunk/lib/Sema/SemaDecl.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Sema/SemaDecl.cpp?rev=258822=258821=258822=diff
==
--- cfe/trunk/lib/Sema/SemaDecl.cpp (original)
+++ cfe/trunk/lib/Sema/SemaDecl.cpp Tue Jan 26 11:47:20 2016
@@ -8290,9 +8290,11 @@ Sema::ActOnFunctionDeclarator(Scope *S,
 }
 
 // Variadic functions, other than a *declaration* of printf, are not 
allowed
-// in device-side CUDA code.
-if (NewFD->isVariadic() && (NewFD->hasAttr() ||
-NewFD->hasAttr()) &&
+// in device-side CUDA code, unless someone passed
+// -fcuda-allow-variadic-functions.
+if (!getLangOpts().CUDAAllowVariadicFunctions && NewFD->isVariadic() &&
+(NewFD->hasAttr() ||
+ NewFD->hasAttr()) &&
 !(II && II->isStr("printf") && NewFD->isExternC() &&
   !D.isFunctionDefinition())) {
   Diag(NewFD->getLocation(), diag::err_variadic_device_fn);

Modified: cfe/trunk/test/SemaCUDA/vararg.cu
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/SemaCUDA/vararg.cu?rev=258822=258821=258822=diff
==
--- cfe/trunk/test/SemaCUDA/vararg.cu (original)
+++ cfe/trunk/test/SemaCUDA/vararg.cu Tue Jan 26 11:47:20 2016
@@ -1,8 +1,11 @@
 // REQUIRES: x86-registered-target
 // REQUIRES: nvptx-registered-target
 // RUN: 

<    2   3   4   5   6   7   8   9   >