[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

I have just pushed a fix, revision 310433.


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Aleksey Shlyapnikov via Phabricator via cfe-commits
alekseyshl added a comment.

Driver/openmp-offload.c still fails on 
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/7038, 
please fix.


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D29654#835548, @hfinkel wrote:

> In https://reviews.llvm.org/D29654#835392, @gtbercea wrote:
>
> > In https://reviews.llvm.org/D29654#835371, @arphaman wrote:
> >
> > > The last RUN line in the new commit triggers the same assertion failure:
> >
>
>
> ...
>
> > Hi Alex, I'm not sure why it's failing as I can't reproduce the error 
> > locally. Do you have access to a machine with the configuration the test 
> > uses?
>
> Can you reproduce if you specifically force the host target to 
> x86_64-apple-darwin17.0.0 (e.g., you pass -target x86_64-apple-darwin17.0.0)?


Yep, I can ! I didn't realize that would actually work thanks! :)


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added a comment.

In https://reviews.llvm.org/D29654#835392, @gtbercea wrote:

> In https://reviews.llvm.org/D29654#835371, @arphaman wrote:
>
> > The last RUN line in the new commit triggers the same assertion failure:
>


...

> Hi Alex, I'm not sure why it's failing as I can't reproduce the error 
> locally. Do you have access to a machine with the configuration the test uses?

Can you reproduce if you specifically force the host target to 
x86_64-apple-darwin17.0.0 (e.g., you pass -target x86_64-apple-darwin17.0.0)?


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

No, entry 1 has a different pointer (0x000111c01320 vs 0x000111c017d0).

Why is there a `DependentBoundArch` if it's always empty?


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D29654#835507, @arphaman wrote:

> In https://reviews.llvm.org/D29654#835501, @gtbercea wrote:
>
> > Is that the last access to CachedResults before the error?
>
>
> Is the assertion the last access? Yes.
>
> There must be a discrepancy between
>
> `UI.DependentBoundArch` in the loop above and `BoundArch` that's used to 
> compute `TargetTC`, otherwise `GetTriplePlusArchString` would return the key 
> that matches the `0x000111c017d0` pointer, i.e. without the additional 
> x86_64.


Maybe I misunderstood the output you pasted but it looks like ActionTC contains 
the same triple+arch that you can find in entry [1] in the map.


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

In https://reviews.llvm.org/D29654#835501, @gtbercea wrote:

> Is that the last access to CachedResults before the error?


Is the assertion the last access? Yes.

There must be a discrepancy between

`UI.DependentBoundArch` in the loop above and `BoundArch` that's used to 
compute `TargetTC`, otherwise `GetTriplePlusArchString` would return the key 
that matches the `0x000111c017d0` pointer, i.e. without the additional 
x86_64.


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

Is that the last access to CachedResults before the error?


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D29654#835429, @arphaman wrote:

> The "x86_64-apple-darwin17.0.0-x86_64-host" triple looks suspicious though


It looks like the triple is in the list though:

second = "x86_64-apple-darwin17.0.0-x86_64-host

it is entry [1].


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

The "x86_64-apple-darwin17.0.0-x86_64-host" triple looks suspicious though


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

If you look at the map then you can see that it contains very similar keys, but 
not the exact one:

  first = {
first = 0x000111c017d0
second = "x86_64-apple-darwin17.0.0-host"
  }
  
  and
  
  first = {
first = 0x000111c01320
second = "x86_64-apple-darwin17.0.0-x86_64-host"
  }


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

The cached results map doesn't have the key:

  (lldb) p CachedResults
  (std::__1::map >, clang::driver::InputInfo, 
std::__1::less > >, std::__1::allocator >, 
clang::driver::InputInfo> > >) $0 = size=8 {
[0] = {
  first = {
first = 0x000111c01320
second = "nvptx64-nvidia-cuda-host"
  }
  second = {
Data = {
  Filename = 0x7fff5fbff8f8 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o"
  InputArg = 0x7fff5fbff8f8
}
Kind = Filename
Act = 0x000111c01320
Type = TY_Object
BaseInput = 0x7fff5fbff8f8 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o"
  }
}
[1] = {
  first = {
first = 0x000111c01320
second = "x86_64-apple-darwin17.0.0-x86_64-host"
  }
  second = {
Data = {
  Filename = 0x7fff5fbff8f8 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o"
  InputArg = 0x7fff5fbff8f8
}
Kind = Filename
Act = 0x000111c01320
Type = TY_Object
BaseInput = 0x7fff5fbff8f8 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o"
  }
}
[2] = {
  first = {
first = 0x000111c01420
second = "nvptx64-nvidia-cuda-host"
  }
  second = {
Data = {
  Filename = 0x7fff5fbff949 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp2.o"
  InputArg = 0x7fff5fbff949
}
Kind = Filename
Act = 0x000111c01420
Type = TY_Object
BaseInput = 0x7fff5fbff949 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp2.o"
  }
}
[3] = {
  first = {
first = 0x000111c017d0
second = "nvptx64-nvidia-cuda-openmp"
  }
  second = {
Data = {
  Filename = 0x000111c048b0 
"/var/folders/sh/cpr85hld32qf79m8x7vd31bwgn/T/openmp-offload-e30496.o"
  InputArg = 0x000111c048b0
}
Kind = Filename
Act = 0x000111c017d0
Type = TY_Object
BaseInput = 0x7fff5fbff8f8 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o"
  }
}
[4] = {
  first = {
first = 0x000111c017d0
second = "x86_64-apple-darwin17.0.0-host"
  }
  second = {
Data = {
  Filename = 0x000111c04830 
"/var/folders/sh/cpr85hld32qf79m8x7vd31bwgn/T/openmp-offload-b856ec.o"
  InputArg = 0x000111c04830
}
Kind = Filename
Act = 0x000111c017d0
Type = TY_Object
BaseInput = 0x7fff5fbff8f8 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o"
  }
}
[5] = {
  first = {
first = 0x000111c01900
second = "nvptx64-nvidia-cuda-openmp"
  }
  second = {
Data = {
  Filename = 0x000111c035d0 
"/var/folders/sh/cpr85hld32qf79m8x7vd31bwgn/T/openmp-offload-be86a1.o"
  InputArg = 0x000111c035d0
}
Kind = Filename
Act = 0x000111c01900
Type = TY_Object
BaseInput = 0x7fff5fbff949 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp2.o"
  }
}
[6] = {
  first = {
first = 0x000111c01900
second = "x86_64-apple-darwin17.0.0-host"
  }
  second = {
Data = {
  Filename = 0x000111c034e0 
"/var/folders/sh/cpr85hld32qf79m8x7vd31bwgn/T/openmp-offload-92791a.o"
  InputArg = 0x000111c034e0
}
Kind = Filename
Act = 0x000111c01900
Type = TY_Object
BaseInput = 0x7fff5fbff949 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp2.o"
  }
}
[7] = {
  first = {
first = 0x000111c01a90
second = "nvptx64-nvidia-cuda-openmp"
  }
  second = {
Data = {
  Filename = 0x000111c03d90 
"/var/folders/sh/cpr85hld32qf79m8x7vd31bwgn/T/openmp-offload-8db204.out"
  InputArg = 0x000111c03d90
}
Kind = Filename
Act = 0x000111c01a90
Type = TY_Image
BaseInput = 0x7fff5fbff8f8 
"/Volumes/newAPFS/bisect/b/tools/clang/test/Driver/Output/openmp-offload.c.tmp1.o"
  }
}
  }

Key:

  (lldb) p ActionTC
  (std::__1::pair, std::__1::allocator > >) $1 = {
first = 0x000111c017d0
second = "x86_64-apple-darwin17.0.0-x86_64-host"
  }


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org

[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

In https://reviews.llvm.org/D29654#835392, @gtbercea wrote:

> In https://reviews.llvm.org/D29654#835371, @arphaman wrote:
>
> > The last RUN line in the new commit triggers the same assertion failure:
> >
> >   Assertion failed: (CachedResults.find(ActionTC) != CachedResults.end() && 
> > "Result does not exist??"), function BuildJobsForActionNoCache, file 
> > /Users/alex/bisect/llvm/tools/clang/lib/Driver/Driver.cpp, line 3419.
> >
> >
> > backtrace:
> >
> >   * frame #0: 0x7fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10
> > frame #1: 0x7fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303
> > frame #2: 0x7fffbf30041f libsystem_c.dylib`abort + 127
> > frame #3: 0x7fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320
> > frame #4: 0x000103a1f311 
> > clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518,
> >  C=0x000111c01130, A=0x000111c017d0, TC=0x00011300, 
> > BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, 
> > MultipleArchs=false, LinkingOutput=0x, 
> > CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at 
> > Driver.cpp:3418
> > frame #5: 0x000103a1cf51 
> > clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
> > C=0x000111c01130, A=0x000111c017d0, TC=0x00011300, 
> > BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, 
> > MultipleArchs=false, LinkingOutput=0x, 
> > CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at 
> > Driver.cpp:3210
> > frame #6: 0x000103a1dff3 
> > clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518,
> >  C=0x000111c01130, A=0x000111c01a30, TC=0x00011300, 
> > BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, 
> > MultipleArchs=false, LinkingOutput=0x, 
> > CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at 
> > Driver.cpp:3348
> > frame #7: 0x000103a1cf51 
> > clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
> > C=0x000111c01130, A=0x000111c01af0, TC=0x00011300, 
> > BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, 
> > MultipleArchs=false, LinkingOutput=0x, 
> > CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at 
> > Driver.cpp:3210
> > frame #8: 0x000103a1db7e 
> > clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518,
> >  C=0x000111c01130, A=0x000111c014f0, TC=0x00011300, 
> > BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
> > MultipleArchs=false, LinkingOutput=0x, 
> > CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at 
> > Driver.cpp:3310
> > frame #9: 0x000103a1cf51 
> > clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
> > C=0x000111c01130, A=0x000111c014f0, TC=0x00011300, 
> > BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
> > MultipleArchs=false, LinkingOutput=0x, 
> > CachedResults=size=8, TargetDeviceOffloadKind=OFK_None) const at 
> > Driver.cpp:3210
> > frame #10: 0x000103a0a602 
> > clang`clang::driver::Driver::BuildJobs(this=0x7fff5fbfe518, 
> > C=0x000111c01130) const at Driver.cpp:2843
> > frame #11: 0x000103a00b9c 
> > clang`clang::driver::Driver::BuildCompilation(this=0x7fff5fbfe518, 
> > ArgList=ArrayRef @ 0x7fff5fbfc218) at Driver.cpp:746
> > frame #12: 0x00015a92 clang`main(argc_=7, 
> > argv_=0x7fff5fbff6a8) at driver.cpp:463
> > frame #13: 0x7fffbf260c05 libdyld.dylib`start + 1
> > frame #14: 0x7fffbf260c05 libdyld.dylib`start + 1
> >   
>
>
> Hi Alex, I'm not sure why it's failing as I can't reproduce the error 
> locally. Do you have access to a machine with the configuration the test uses?


Yes, I can reproduce it on my machine.


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D29654#835371, @arphaman wrote:

> The last RUN line in the new commit triggers the same assertion failure:
>
>   Assertion failed: (CachedResults.find(ActionTC) != CachedResults.end() && 
> "Result does not exist??"), function BuildJobsForActionNoCache, file 
> /Users/alex/bisect/llvm/tools/clang/lib/Driver/Driver.cpp, line 3419.
>
>
> backtrace:
>
>   * frame #0: 0x7fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10
> frame #1: 0x7fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303
> frame #2: 0x7fffbf30041f libsystem_c.dylib`abort + 127
> frame #3: 0x7fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320
> frame #4: 0x000103a1f311 
> clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518,
>  C=0x000111c01130, A=0x000111c017d0, TC=0x00011300, 
> BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3418
> frame #5: 0x000103a1cf51 
> clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
> C=0x000111c01130, A=0x000111c017d0, TC=0x00011300, 
> BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
> frame #6: 0x000103a1dff3 
> clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518,
>  C=0x000111c01130, A=0x000111c01a30, TC=0x00011300, 
> BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3348
> frame #7: 0x000103a1cf51 
> clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
> C=0x000111c01130, A=0x000111c01af0, TC=0x00011300, 
> BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
> frame #8: 0x000103a1db7e 
> clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518,
>  C=0x000111c01130, A=0x000111c014f0, TC=0x00011300, 
> BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3310
> frame #9: 0x000103a1cf51 
> clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
> C=0x000111c01130, A=0x000111c014f0, TC=0x00011300, 
> BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
> frame #10: 0x000103a0a602 
> clang`clang::driver::Driver::BuildJobs(this=0x7fff5fbfe518, 
> C=0x000111c01130) const at Driver.cpp:2843
> frame #11: 0x000103a00b9c 
> clang`clang::driver::Driver::BuildCompilation(this=0x7fff5fbfe518, 
> ArgList=ArrayRef @ 0x7fff5fbfc218) at Driver.cpp:746
> frame #12: 0x00015a92 clang`main(argc_=7, 
> argv_=0x7fff5fbff6a8) at driver.cpp:463
> frame #13: 0x7fffbf260c05 libdyld.dylib`start + 1
> frame #14: 0x7fffbf260c05 libdyld.dylib`start + 1
>   


Hi Alex, I'm not sure why it's failing as I can't reproduce the error locally. 
Do you have access to a machine with the configuration the test uses?


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

The last RUN line in the new commit triggers the same assertion failure:

  Assertion failed: (CachedResults.find(ActionTC) != CachedResults.end() && 
"Result does not exist??"), function BuildJobsForActionNoCache, file 
/Users/alex/bisect/llvm/tools/clang/lib/Driver/Driver.cpp, line 3419.

backtrace:

  * frame #0: 0x7fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303
frame #2: 0x7fffbf30041f libsystem_c.dylib`abort + 127
frame #3: 0x7fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320
frame #4: 0x000103a1f311 
clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518, 
C=0x000111c01130, A=0x000111c017d0, TC=0x00011300, 
BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, 
LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3418
frame #5: 0x000103a1cf51 
clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
C=0x000111c01130, A=0x000111c017d0, TC=0x00011300, 
BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, 
LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
frame #6: 0x000103a1dff3 
clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518, 
C=0x000111c01130, A=0x000111c01a30, TC=0x00011300, 
BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, 
LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3348
frame #7: 0x000103a1cf51 
clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
C=0x000111c01130, A=0x000111c01af0, TC=0x00011300, 
BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, 
LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
frame #8: 0x000103a1db7e 
clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe518, 
C=0x000111c01130, A=0x000111c014f0, TC=0x00011300, 
BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3310
frame #9: 0x000103a1cf51 
clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe518, 
C=0x000111c01130, A=0x000111c014f0, TC=0x00011300, 
BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
frame #10: 0x000103a0a602 
clang`clang::driver::Driver::BuildJobs(this=0x7fff5fbfe518, 
C=0x000111c01130) const at Driver.cpp:2843
frame #11: 0x000103a00b9c 
clang`clang::driver::Driver::BuildCompilation(this=0x7fff5fbfe518, 
ArgList=ArrayRef @ 0x7fff5fbfc218) at Driver.cpp:746
frame #12: 0x00015a92 clang`main(argc_=7, argv_=0x7fff5fbff6a8) 
at driver.cpp:463
frame #13: 0x7fffbf260c05 libdyld.dylib`start + 1
frame #14: 0x7fffbf260c05 libdyld.dylib`start + 1


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

Looks like it's still failing unfortunately: 
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental_check/39182/console


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D29654#835256, @arphaman wrote:

> Great, thanks! I think that you can just revert my revert with the fix 
> applied in one commit


Hi Alex,

I just commited the changes again.

Let me know if it still fails for you. I think the issue was actually that the 
test wasn't quite correct so I cleaned that up.

--Doru


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

Great, thanks! I think that you can just revert my revert with the fix applied 
in one commit


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

In https://reviews.llvm.org/D29654#835045, @arphaman wrote:

> Hi @gtbercea,
>  I couldn't reply to the email as cfe-commits didn't even register this 
> commit somehow, so I'm replying here.
>
> Unfortunately I had to revert this commit (r310291), + two others for a clean 
> revert (r310300 and r310332) because it caused a test failure on macOS. This 
> particular run line:
>
>   // RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp 
> -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o 
> %t2.o 2>&1 \
>   // RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s
>
>
> Causes the following assertion failure:
>
>   assert(CachedResults.find(ActionTC) != CachedResults.end() &&
>  "Result does not exist??");
>   
>
> Here's a backtrace:
>
>   * frame #0: 0x7fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10
> frame #1: 0x7fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303
> frame #2: 0x7fffbf30041f libsystem_c.dylib`abort + 127
> frame #3: 0x7fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320
> frame #4: 0x000103a1f2d1 
> clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe4e8,
>  C=0x000111b11830, A=0x000111b11ed0, TC=0x000112819000, 
> BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3418
> frame #5: 0x000103a1cf11 
> clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe4e8, 
> C=0x000111b11830, A=0x000111b11ed0, TC=0x000112819000, 
> BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
> frame #6: 0x000103a1dfb3 
> clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe4e8,
>  C=0x000111b11830, A=0x000111b12130, TC=0x000112819000, 
> BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3348
> frame #7: 0x000103a1cf11 
> clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe4e8, 
> C=0x000111b11830, A=0x000111b121f0, TC=0x000112819000, 
> BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
> frame #8: 0x000103a1db3e 
> clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe4e8,
>  C=0x000111b11830, A=0x000111b11bf0, TC=0x000112819000, 
> BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3310
> frame #9: 0x000103a1cf11 
> clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe4e8, 
> C=0x000111b11830, A=0x000111b11bf0, TC=0x000112819000, 
> BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
> MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
> TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
> frame #10: 0x000103a0a5c2 
> clang`clang::driver::Driver::BuildJobs(this=0x7fff5fbfe4e8, 
> C=0x000111b11830) const at Driver.cpp:2843
> frame #11: 0x000103a00b5c 
> clang`clang::driver::Driver::BuildCompilation(this=0x7fff5fbfe4e8, 
> ArgList=ArrayRef @ 0x7fff5fbfc1e8) at Driver.cpp:746
> frame #12: 0x00015a52 clang`main(argc_=9, 
> argv_=0x7fff5fbff670) at driver.cpp:463
> frame #13: 0x7fffbf260c05 libdyld.dylib`start + 1
>   
>
> Could you please take a look?
>
> Let me know if you need anything else,
>  Cheers,
> Alex


Hi Alex,

I have a fix for the failing test. What's the easiest way to do this? Do I have 
to commit those patches again or can you push them back in and then I also push 
the fix?

Thanks,

--Doru

--


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

The revert commit is r310345 btw


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-08 Thread Alex Lorenz via Phabricator via cfe-commits
arphaman added a comment.

Hi @gtbercea,
I couldn't reply to the email as cfe-commits didn't even register this commit 
somehow, so I'm replying here.

Unfortunately I had to revert this commit (r310291), + two others for a clean 
revert (r310300 and r310332) because it caused a test failure on macOS. This 
particular run line:

  // RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp 
-fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o 
%t2.o 2>&1 \
  // RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s

Causes the following assertion failure:

  assert(CachedResults.find(ActionTC) != CachedResults.end() &&
 "Result does not exist??");

Here's a backtrace:

  * frame #0: 0x7fffbf3a2b2e libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fffbf4c72de libsystem_pthread.dylib`pthread_kill + 303
frame #2: 0x7fffbf30041f libsystem_c.dylib`abort + 127
frame #3: 0x7fffbf2c9f34 libsystem_c.dylib`__assert_rtn + 320
frame #4: 0x000103a1f2d1 
clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe4e8, 
C=0x000111b11830, A=0x000111b11ed0, TC=0x000112819000, 
BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, 
LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3418
frame #5: 0x000103a1cf11 
clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe4e8, 
C=0x000111b11830, A=0x000111b11ed0, TC=0x000112819000, 
BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=false, MultipleArchs=false, 
LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
frame #6: 0x000103a1dfb3 
clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe4e8, 
C=0x000111b11830, A=0x000111b12130, TC=0x000112819000, 
BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, 
LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3348
frame #7: 0x000103a1cf11 
clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe4e8, 
C=0x000111b11830, A=0x000111b121f0, TC=0x000112819000, 
BoundArch=(Data = "x86_64", Length = 6), AtTopLevel=true, MultipleArchs=false, 
LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
frame #8: 0x000103a1db3e 
clang`clang::driver::Driver::BuildJobsForActionNoCache(this=0x7fff5fbfe4e8, 
C=0x000111b11830, A=0x000111b11bf0, TC=0x000112819000, 
BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3310
frame #9: 0x000103a1cf11 
clang`clang::driver::Driver::BuildJobsForAction(this=0x7fff5fbfe4e8, 
C=0x000111b11830, A=0x000111b11bf0, TC=0x000112819000, 
BoundArch=(Data = 0x, Length = 0), AtTopLevel=true, 
MultipleArchs=false, LinkingOutput=0x, CachedResults=size=8, 
TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:3210
frame #10: 0x000103a0a5c2 
clang`clang::driver::Driver::BuildJobs(this=0x7fff5fbfe4e8, 
C=0x000111b11830) const at Driver.cpp:2843
frame #11: 0x000103a00b5c 
clang`clang::driver::Driver::BuildCompilation(this=0x7fff5fbfe4e8, 
ArgList=ArrayRef @ 0x7fff5fbfc1e8) at Driver.cpp:746
frame #12: 0x00015a52 clang`main(argc_=9, argv_=0x7fff5fbff670) 
at driver.cpp:463
frame #13: 0x7fffbf260c05 libdyld.dylib`start + 1

Could you please take a look?

Let me know if you need anything else,
Cheers,
Alex


https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-08-07 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 110056.
gtbercea added a comment.

Add -no-canonical-prefixes to tests.


https://reviews.llvm.org/D29654

Files:
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -629,3 +629,43 @@
 // RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-NESTED-ERROR %s
 
 // CHK-FOPENMP-TARGET-NESTED-ERROR: clang{{.*}} error: invalid -Xopenmp-target argument: '-Xopenmp-target -Xopenmp-target', options requiring arguments are unsupported
+
+/// ###
+
+/// Check -Xopenmp-target uses one of the archs provided when several archs are used.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_35 -Xopenmp-target -march=sm_60 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-ARCHS %s
+
+// CHK-FOPENMP-TARGET-ARCHS: ptxas{{.*}}" "--gpu-name" "sm_60"
+// CHK-FOPENMP-TARGET-ARCHS: nvlink{{.*}}" "-arch" "sm_60"
+
+/// ###
+
+/// Check -Xopenmp-target -march=sm_35 works as expected when two triples are present.
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_35 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-FOPENMP-TARGET-COMPILATION %s
+
+// CHK-FOPENMP-TARGET-COMPILATION: ptxas{{.*}}" "--gpu-name" "sm_35"
+// CHK-FOPENMP-TARGET-COMPILATION: nvlink{{.*}}" "-arch" "sm_35"
+
+/// ###
+
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.cubin" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
+/// Check cubin file generation and usage by nvlink
+// RUN:   touch %t1.o
+// RUN:   touch %t2.o
+// RUN:   %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s
+
+// CHK-TWOCUBIN: clang-offload-bundler" "-type=o" "{{.*}}inputs={{.*}}tmp1.o" "-outputs={{.*}}.o,{{.*}}tmp1-openmp-nvptx64-nvidia-cuda.cubin" "-unbundle"
+// CHK-TWOCUBIN-NEXT: clang-offload-bundler" "-type=o" "{{.*}}inputs={{.*}}tmp2.o" "-outputs={{.*}}.o,{{.*}}tmp2-openmp-nvptx64-nvidia-cuda.cubin" "-unbundle"
+// CHK-TWOCUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload.c.tmp1-openmp-nvptx64-nvidia-cuda.cubin" "openmp-offload.c.tmp2-openmp-nvptx64-nvidia-cuda.cubin"
Index: lib/Driver/ToolChains/Gnu.cpp
===
--- lib/Driver/ToolChains/Gnu.cpp
+++ lib/Driver/ToolChains/Gnu.cpp
@@ -204,131 +204,6 @@
   // The types are (hopefully) good enough.
 }
 
-/// Add OpenMP linker script arguments at the end of the argument list so that
-/// the fat binary is built by embedding each of the device images into the
-/// host. The linker script also defines a few symbols required by the code
-/// generation so that the images can be easily retrieved at runtime by the
-/// offloading library. This should be used only in tool chains that support
-/// linker scripts.
-static void AddOpenMPLinkerScript(const ToolChain , Compilation ,
-  const InputInfo ,
-  const InputInfoList ,
-  const ArgList , ArgStringList ,
-  const JobAction ) {
-
-  // If this is not an OpenMP host toolchain, we don't need to do anything.
-  if (!JA.isHostOffloading(Action::OFK_OpenMP))
-return;
-
-  // Create temporary linker script. Keep it if save-temps is enabled.
-  const char *LKS;
-  SmallString<256> Name = llvm::sys::path::filename(Output.getFilename());
-  if (C.getDriver().isSaveTempsEnabled()) {
-llvm::sys::path::replace_extension(Name, "lk");
-LKS = C.getArgs().MakeArgString(Name.c_str());
-  } else {
-llvm::sys::path::replace_extension(Name, "");
-Name = 

[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-06-28 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 104535.
gtbercea added a comment.

[Update regression tests] Add a test for propagating the compute capability to 
the OpenMP device offloading toolchain which targets NVIDIA GPUs.
This is a test for patch D34784  which is 
enabled by this patch.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654

Files:
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -607,3 +607,34 @@
 // RUN:   | FileCheck -check-prefix=CHK-COMPUTE-CAPABILITY %s
 
 // CHK-COMPUTE-CAPABILITY: clang: warning: argument unused during compilation: '-fopenmp-target-arch=sm_35'
+
+/// ###
+
+/// Check -fopenmp-target-arch propagates compute capability to device offloading toolchain.
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes -fopenmp-target-arch=sm_35 %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-GPU-COMPUTE-CAPABILITY %s
+
+// CHK-GPU-COMPUTE-CAPABILITY: ptxas{{.*}}" "--gpu-name" "sm_35"
+// CHK-GPU-COMPUTE-CAPABILITY-NEXT: nvlink{{.*}}" "-arch" "sm_35"
+
+/// ###
+
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.cubin" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
+/// Check cubin file generation and usage by nvlink
+// RUN:   touch %t1.o
+// RUN:   touch %t2.o
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s
+
+// CHK-TWOCUBIN: clang-offload-bundler" "-type=o" "{{.*}}inputs={{.*}}tmp1.o" "-outputs={{.*}}.o,{{.*}}tmp1-openmp-nvptx64-nvidia-cuda.cubin" "-unbundle"
+// CHK-TWOCUBIN-NEXT: clang-offload-bundler" "-type=o" "{{.*}}inputs={{.*}}tmp2.o" "-outputs={{.*}}.o,{{.*}}tmp2-openmp-nvptx64-nvidia-cuda.cubin" "-unbundle"
+// CHK-TWOCUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload.c.tmp1-openmp-nvptx64-nvidia-cuda.cubin" "openmp-offload.c.tmp2-openmp-nvptx64-nvidia-cuda.cubin"
Index: lib/Driver/ToolChains/Gnu.cpp
===
--- lib/Driver/ToolChains/Gnu.cpp
+++ lib/Driver/ToolChains/Gnu.cpp
@@ -203,131 +203,6 @@
   // The types are (hopefully) good enough.
 }
 
-/// Add OpenMP linker script arguments at the end of the argument list so that
-/// the fat binary is built by embedding each of the device images into the
-/// host. The linker script also defines a few symbols required by the code
-/// generation so that the images can be easily retrieved at runtime by the
-/// offloading library. This should be used only in tool chains that support
-/// linker scripts.
-static void AddOpenMPLinkerScript(const ToolChain , Compilation ,
-  const InputInfo ,
-  const InputInfoList ,
-  const ArgList , ArgStringList ,
-  const JobAction ) {
-
-  // If this is not an OpenMP host toolchain, we don't need to do anything.
-  if (!JA.isHostOffloading(Action::OFK_OpenMP))
-return;
-
-  // Create temporary linker script. Keep it if save-temps is enabled.
-  const char *LKS;
-  SmallString<256> Name = llvm::sys::path::filename(Output.getFilename());
-  if (C.getDriver().isSaveTempsEnabled()) {
-llvm::sys::path::replace_extension(Name, "lk");
-LKS = C.getArgs().MakeArgString(Name.c_str());
-  } else {
-llvm::sys::path::replace_extension(Name, "");
-Name = C.getDriver().GetTemporaryPath(Name, "lk");
-LKS = C.addTempFile(C.getArgs().MakeArgString(Name.c_str()));
-  }
-
-  // Add linker script option to the command.
-  CmdArgs.push_back("-T");
-  CmdArgs.push_back(LKS);
-
-  // Create a buffer to write the contents of the linker script.
-  std::string LksBuffer;
-  llvm::raw_string_ostream LksStream(LksBuffer);
-
-  // Get the OpenMP offload tool chains so that we can extract the triple
-  // 

[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-06-21 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel accepted this revision.
hfinkel added a comment.
This revision is now accepted and ready to land.

LGTM


Repository:
  rL LLVM

https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-05-12 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 98766.
gtbercea added a comment.

Address comments.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654

Files:
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -590,6 +590,28 @@
 
 /// ###
 
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.cubin" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
+/// Check cubin file generation and usage by nvlink
+// RUN:   touch %t1.o
+// RUN:   touch %t2.o
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s
+
+// CHK-TWOCUBIN: clang-offload-bundler" "-type=o" "{{.*}}inputs={{.*}}tmp1.o" "-outputs={{.*}}.o,{{.*}}tmp1-openmp-nvptx64-nvidia-cuda.cubin" "-unbundle"
+// CHK-TWOCUBIN-NEXT: clang-offload-bundler" "-type=o" "{{.*}}inputs={{.*}}tmp2.o" "-outputs={{.*}}.o,{{.*}}tmp2-openmp-nvptx64-nvidia-cuda.cubin" "-unbundle"
+// CHK-TWOCUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload.c.tmp1-openmp-nvptx64-nvidia-cuda.cubin" "openmp-offload.c.tmp2-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
 /// Check PTXAS is passed -c flag when offloading to an NVIDIA device using OpenMP.
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-PTXAS-DEFAULT %s
Index: lib/Driver/ToolChains/Gnu.cpp
===
--- lib/Driver/ToolChains/Gnu.cpp
+++ lib/Driver/ToolChains/Gnu.cpp
@@ -203,131 +203,6 @@
   // The types are (hopefully) good enough.
 }
 
-/// Add OpenMP linker script arguments at the end of the argument list so that
-/// the fat binary is built by embedding each of the device images into the
-/// host. The linker script also defines a few symbols required by the code
-/// generation so that the images can be easily retrieved at runtime by the
-/// offloading library. This should be used only in tool chains that support
-/// linker scripts.
-static void AddOpenMPLinkerScript(const ToolChain , Compilation ,
-  const InputInfo ,
-  const InputInfoList ,
-  const ArgList , ArgStringList ,
-  const JobAction ) {
-
-  // If this is not an OpenMP host toolchain, we don't need to do anything.
-  if (!JA.isHostOffloading(Action::OFK_OpenMP))
-return;
-
-  // Create temporary linker script. Keep it if save-temps is enabled.
-  const char *LKS;
-  SmallString<256> Name = llvm::sys::path::filename(Output.getFilename());
-  if (C.getDriver().isSaveTempsEnabled()) {
-llvm::sys::path::replace_extension(Name, "lk");
-LKS = C.getArgs().MakeArgString(Name.c_str());
-  } else {
-llvm::sys::path::replace_extension(Name, "");
-Name = C.getDriver().GetTemporaryPath(Name, "lk");
-LKS = C.addTempFile(C.getArgs().MakeArgString(Name.c_str()));
-  }
-
-  // Add linker script option to the command.
-  CmdArgs.push_back("-T");
-  CmdArgs.push_back(LKS);
-
-  // Create a buffer to write the contents of the linker script.
-  std::string LksBuffer;
-  llvm::raw_string_ostream LksStream(LksBuffer);
-
-  // Get the OpenMP offload tool chains so that we can extract the triple
-  // associated with each device input.
-  auto OpenMPToolChains = C.getOffloadToolChains();
-  assert(OpenMPToolChains.first != OpenMPToolChains.second &&
- "No OpenMP toolchains??");
-
-  // Track the input file name and device triple in order to build the script,
-  // inserting binaries in the designated sections.
-  SmallVector, 8> InputBinaryInfo;
-
-  // Add commands to embed target binaries. We ensure that each section and
-  // image is 16-byte aligned. This is not mandatory, but increases the
-  // likelihood of data to be aligned 

[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-05-09 Thread Hal Finkel via Phabricator via cfe-commits
hfinkel added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:388
+
+  // add paths specified in LIBRARY_PATH environment variable as -L options.
+  addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");

Comment sentences should start with a capital letter (here and below).



Comment at: lib/Driver/ToolChains/Cuda.cpp:407
+  // the target details to the driver and maybe we do not want to do
+  // that
+  for (const auto  : Inputs) {

Add period at the end of the sentence.



Comment at: lib/Driver/ToolChains/Cuda.cpp:409
+  for (const auto  : Inputs) {
+
+if (II.getType() == types::TY_LLVM_IR ||

Remove this blank line.



Comment at: lib/Driver/ToolChains/Cuda.cpp:420
+// Currently, we only pass the input files to the linker, we do not pass
+// any libraries that may be valid only for the host.
+if (!II.isFilename())

Can you please explain how this works for libraries specified by file name 
(e.g. /path/to/foo.so or /path/to/foo.a) or why these should be treated 
differently? Maybe I'm misunderstanding what this check does.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-05-09 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

ping


Repository:
  rL LLVM

https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-05-01 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea added a comment.

@rnk any further thought on the changes to this patch? Thanks :)


Repository:
  rL LLVM

https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-05-01 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 97331.
gtbercea added a comment.

Change output of unbundler to produce cubin files when using OpenMP to offload 
to NVIDIA GPUs.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654

Files:
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -590,6 +590,28 @@
 
 /// ###
 
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.cubin" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
+/// Check cubin file generation and usage by nvlink
+// RUN:   touch %t1.o
+// RUN:   touch %t2.o
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-TWOCUBIN %s
+
+// CHK-TWOCUBIN: clang-offload-bundler" "-type=o" "{{.*}}inputs={{.*}}tmp1.o" "-outputs={{.*}}.o,{{.*}}tmp1-openmp-nvptx64-nvidia-cuda.cubin" "-unbundle"
+// CHK-TWOCUBIN-NEXT: clang-offload-bundler" "-type=o" "{{.*}}inputs={{.*}}tmp2.o" "-outputs={{.*}}.o,{{.*}}tmp2-openmp-nvptx64-nvidia-cuda.cubin" "-unbundle"
+// CHK-TWOCUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload.c.tmp1-openmp-nvptx64-nvidia-cuda.cubin" "openmp-offload.c.tmp2-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
 /// Check PTXAS is passed -c flag when offloading to an NVIDIA device using OpenMP.
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-PTXAS-DEFAULT %s
Index: lib/Driver/ToolChains/Gnu.cpp
===
--- lib/Driver/ToolChains/Gnu.cpp
+++ lib/Driver/ToolChains/Gnu.cpp
@@ -203,131 +203,6 @@
   // The types are (hopefully) good enough.
 }
 
-/// Add OpenMP linker script arguments at the end of the argument list so that
-/// the fat binary is built by embedding each of the device images into the
-/// host. The linker script also defines a few symbols required by the code
-/// generation so that the images can be easily retrieved at runtime by the
-/// offloading library. This should be used only in tool chains that support
-/// linker scripts.
-static void AddOpenMPLinkerScript(const ToolChain , Compilation ,
-  const InputInfo ,
-  const InputInfoList ,
-  const ArgList , ArgStringList ,
-  const JobAction ) {
-
-  // If this is not an OpenMP host toolchain, we don't need to do anything.
-  if (!JA.isHostOffloading(Action::OFK_OpenMP))
-return;
-
-  // Create temporary linker script. Keep it if save-temps is enabled.
-  const char *LKS;
-  SmallString<256> Name = llvm::sys::path::filename(Output.getFilename());
-  if (C.getDriver().isSaveTempsEnabled()) {
-llvm::sys::path::replace_extension(Name, "lk");
-LKS = C.getArgs().MakeArgString(Name.c_str());
-  } else {
-llvm::sys::path::replace_extension(Name, "");
-Name = C.getDriver().GetTemporaryPath(Name, "lk");
-LKS = C.addTempFile(C.getArgs().MakeArgString(Name.c_str()));
-  }
-
-  // Add linker script option to the command.
-  CmdArgs.push_back("-T");
-  CmdArgs.push_back(LKS);
-
-  // Create a buffer to write the contents of the linker script.
-  std::string LksBuffer;
-  llvm::raw_string_ostream LksStream(LksBuffer);
-
-  // Get the OpenMP offload tool chains so that we can extract the triple
-  // associated with each device input.
-  auto OpenMPToolChains = C.getOffloadToolChains();
-  assert(OpenMPToolChains.first != OpenMPToolChains.second &&
- "No OpenMP toolchains??");
-
-  // Track the input file name and device triple in order to build the script,
-  // inserting binaries in the designated sections.
-  SmallVector, 8> InputBinaryInfo;
-
-  // Add commands to embed target binaries. We ensure that each section and
-  // image is 16-byte aligned. This 

[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-04-19 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 95737.
gtbercea marked an inline comment as done.
gtbercea added a comment.

Avoid renaming by enabling PTXAS to generate an output file with the 
appropriate extension, in this case a cubin extension.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654

Files:
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -590,6 +590,16 @@
 
 /// ###
 
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.cubin" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
 /// Check PTXAS is passed -c flag when offloading to an NVIDIA device using OpenMP.
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-PTXAS-DEFAULT %s
Index: lib/Driver/ToolChains/Gnu.cpp
===
--- lib/Driver/ToolChains/Gnu.cpp
+++ lib/Driver/ToolChains/Gnu.cpp
@@ -203,131 +203,6 @@
   // The types are (hopefully) good enough.
 }
 
-/// Add OpenMP linker script arguments at the end of the argument list so that
-/// the fat binary is built by embedding each of the device images into the
-/// host. The linker script also defines a few symbols required by the code
-/// generation so that the images can be easily retrieved at runtime by the
-/// offloading library. This should be used only in tool chains that support
-/// linker scripts.
-static void AddOpenMPLinkerScript(const ToolChain , Compilation ,
-  const InputInfo ,
-  const InputInfoList ,
-  const ArgList , ArgStringList ,
-  const JobAction ) {
-
-  // If this is not an OpenMP host toolchain, we don't need to do anything.
-  if (!JA.isHostOffloading(Action::OFK_OpenMP))
-return;
-
-  // Create temporary linker script. Keep it if save-temps is enabled.
-  const char *LKS;
-  SmallString<256> Name = llvm::sys::path::filename(Output.getFilename());
-  if (C.getDriver().isSaveTempsEnabled()) {
-llvm::sys::path::replace_extension(Name, "lk");
-LKS = C.getArgs().MakeArgString(Name.c_str());
-  } else {
-llvm::sys::path::replace_extension(Name, "");
-Name = C.getDriver().GetTemporaryPath(Name, "lk");
-LKS = C.addTempFile(C.getArgs().MakeArgString(Name.c_str()));
-  }
-
-  // Add linker script option to the command.
-  CmdArgs.push_back("-T");
-  CmdArgs.push_back(LKS);
-
-  // Create a buffer to write the contents of the linker script.
-  std::string LksBuffer;
-  llvm::raw_string_ostream LksStream(LksBuffer);
-
-  // Get the OpenMP offload tool chains so that we can extract the triple
-  // associated with each device input.
-  auto OpenMPToolChains = C.getOffloadToolChains();
-  assert(OpenMPToolChains.first != OpenMPToolChains.second &&
- "No OpenMP toolchains??");
-
-  // Track the input file name and device triple in order to build the script,
-  // inserting binaries in the designated sections.
-  SmallVector, 8> InputBinaryInfo;
-
-  // Add commands to embed target binaries. We ensure that each section and
-  // image is 16-byte aligned. This is not mandatory, but increases the
-  // likelihood of data to be aligned with a cache block in several main host
-  // machines.
-  LksStream << "/*\n";
-  LksStream << "   OpenMP Offload Linker Script\n";
-  LksStream << " *** Automatically generated by Clang ***\n";
-  LksStream << "*/\n";
-  LksStream << "TARGET(binary)\n";
-  auto DTC = OpenMPToolChains.first;
-  for (auto  : Inputs) {
-const Action *A = II.getAction();
-// Is this a device linking action?
-if (A && isa(A) &&
-A->isDeviceOffloading(Action::OFK_OpenMP)) {
-  assert(DTC != OpenMPToolChains.second &&
- "More device inputs than device toolchains??");
-  InputBinaryInfo.push_back(std::make_pair(
-  DTC->second->getTriple().normalize(), II.getFilename()));
-  ++DTC;
-  LksStream << "INPUT(" << 

[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-04-13 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea marked 4 inline comments as done.
gtbercea added inline comments.



Comment at: test/Driver/openmp-offload.c:594
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
-save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s

rnk wrote:
> In this case, it would be nicer if you arranged for ptxas to output to a 
> .cubin file.
@rnk I'm not sure what you mean by this comment, can you please clarify?


Repository:
  rL LLVM

https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-04-13 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 95175.
gtbercea added a comment.

Use the rename() utility function of LLVM for renaming the PTXAS output before 
invoking NVLINK.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654

Files:
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -590,6 +590,16 @@
 
 /// ###
 
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" {{.*}} "openmp-offload-openmp-nvptx64-nvidia-cuda.cubin"
+
+/// ###
+
 /// Check PTXAS is passed -c flag when offloading to an NVIDIA device using OpenMP.
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-PTXAS-DEFAULT %s
Index: lib/Driver/ToolChains/Gnu.cpp
===
--- lib/Driver/ToolChains/Gnu.cpp
+++ lib/Driver/ToolChains/Gnu.cpp
@@ -203,131 +203,6 @@
   // The types are (hopefully) good enough.
 }
 
-/// Add OpenMP linker script arguments at the end of the argument list so that
-/// the fat binary is built by embedding each of the device images into the
-/// host. The linker script also defines a few symbols required by the code
-/// generation so that the images can be easily retrieved at runtime by the
-/// offloading library. This should be used only in tool chains that support
-/// linker scripts.
-static void AddOpenMPLinkerScript(const ToolChain , Compilation ,
-  const InputInfo ,
-  const InputInfoList ,
-  const ArgList , ArgStringList ,
-  const JobAction ) {
-
-  // If this is not an OpenMP host toolchain, we don't need to do anything.
-  if (!JA.isHostOffloading(Action::OFK_OpenMP))
-return;
-
-  // Create temporary linker script. Keep it if save-temps is enabled.
-  const char *LKS;
-  SmallString<256> Name = llvm::sys::path::filename(Output.getFilename());
-  if (C.getDriver().isSaveTempsEnabled()) {
-llvm::sys::path::replace_extension(Name, "lk");
-LKS = C.getArgs().MakeArgString(Name.c_str());
-  } else {
-llvm::sys::path::replace_extension(Name, "");
-Name = C.getDriver().GetTemporaryPath(Name, "lk");
-LKS = C.addTempFile(C.getArgs().MakeArgString(Name.c_str()));
-  }
-
-  // Add linker script option to the command.
-  CmdArgs.push_back("-T");
-  CmdArgs.push_back(LKS);
-
-  // Create a buffer to write the contents of the linker script.
-  std::string LksBuffer;
-  llvm::raw_string_ostream LksStream(LksBuffer);
-
-  // Get the OpenMP offload tool chains so that we can extract the triple
-  // associated with each device input.
-  auto OpenMPToolChains = C.getOffloadToolChains();
-  assert(OpenMPToolChains.first != OpenMPToolChains.second &&
- "No OpenMP toolchains??");
-
-  // Track the input file name and device triple in order to build the script,
-  // inserting binaries in the designated sections.
-  SmallVector, 8> InputBinaryInfo;
-
-  // Add commands to embed target binaries. We ensure that each section and
-  // image is 16-byte aligned. This is not mandatory, but increases the
-  // likelihood of data to be aligned with a cache block in several main host
-  // machines.
-  LksStream << "/*\n";
-  LksStream << "   OpenMP Offload Linker Script\n";
-  LksStream << " *** Automatically generated by Clang ***\n";
-  LksStream << "*/\n";
-  LksStream << "TARGET(binary)\n";
-  auto DTC = OpenMPToolChains.first;
-  for (auto  : Inputs) {
-const Action *A = II.getAction();
-// Is this a device linking action?
-if (A && isa(A) &&
-A->isDeviceOffloading(Action::OFK_OpenMP)) {
-  assert(DTC != OpenMPToolChains.second &&
- "More device inputs than device toolchains??");
-  InputBinaryInfo.push_back(std::make_pair(
-  DTC->second->getTriple().normalize(), II.getFilename()));
-  ++DTC;
-  LksStream << "INPUT(" << II.getFilename() << ")\n";
-}
-  }
-
-  assert(DTC == OpenMPToolChains.second 

[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-04-06 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 94475.
gtbercea added a comment.

Address some of the reviews.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654

Files:
  lib/Driver/Driver.cpp
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Cuda.h
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -590,6 +590,17 @@
 
 /// ###
 
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: cp{{.*}}-openmp-nvptx64-nvidia-cuda.o" "{{.*}}-openmp-nvptx64-nvidia-cuda-{{.*}}.cubin"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" "{{.*}}" "{{.*}}-openmp-nvptx64-nvidia-cuda-{{.*}}.cubin"
+
+/// ###
+
 /// Check PTXAS is passed -c flag when offloading to an NVIDIA device using OpenMP.
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-PTXAS-DEFAULT %s
Index: lib/Driver/ToolChains/Gnu.cpp
===
--- lib/Driver/ToolChains/Gnu.cpp
+++ lib/Driver/ToolChains/Gnu.cpp
@@ -203,131 +203,6 @@
   // The types are (hopefully) good enough.
 }
 
-/// Add OpenMP linker script arguments at the end of the argument list so that
-/// the fat binary is built by embedding each of the device images into the
-/// host. The linker script also defines a few symbols required by the code
-/// generation so that the images can be easily retrieved at runtime by the
-/// offloading library. This should be used only in tool chains that support
-/// linker scripts.
-static void AddOpenMPLinkerScript(const ToolChain , Compilation ,
-  const InputInfo ,
-  const InputInfoList ,
-  const ArgList , ArgStringList ,
-  const JobAction ) {
-
-  // If this is not an OpenMP host toolchain, we don't need to do anything.
-  if (!JA.isHostOffloading(Action::OFK_OpenMP))
-return;
-
-  // Create temporary linker script. Keep it if save-temps is enabled.
-  const char *LKS;
-  SmallString<256> Name = llvm::sys::path::filename(Output.getFilename());
-  if (C.getDriver().isSaveTempsEnabled()) {
-llvm::sys::path::replace_extension(Name, "lk");
-LKS = C.getArgs().MakeArgString(Name.c_str());
-  } else {
-llvm::sys::path::replace_extension(Name, "");
-Name = C.getDriver().GetTemporaryPath(Name, "lk");
-LKS = C.addTempFile(C.getArgs().MakeArgString(Name.c_str()));
-  }
-
-  // Add linker script option to the command.
-  CmdArgs.push_back("-T");
-  CmdArgs.push_back(LKS);
-
-  // Create a buffer to write the contents of the linker script.
-  std::string LksBuffer;
-  llvm::raw_string_ostream LksStream(LksBuffer);
-
-  // Get the OpenMP offload tool chains so that we can extract the triple
-  // associated with each device input.
-  auto OpenMPToolChains = C.getOffloadToolChains();
-  assert(OpenMPToolChains.first != OpenMPToolChains.second &&
- "No OpenMP toolchains??");
-
-  // Track the input file name and device triple in order to build the script,
-  // inserting binaries in the designated sections.
-  SmallVector, 8> InputBinaryInfo;
-
-  // Add commands to embed target binaries. We ensure that each section and
-  // image is 16-byte aligned. This is not mandatory, but increases the
-  // likelihood of data to be aligned with a cache block in several main host
-  // machines.
-  LksStream << "/*\n";
-  LksStream << "   OpenMP Offload Linker Script\n";
-  LksStream << " *** Automatically generated by Clang ***\n";
-  LksStream << "*/\n";
-  LksStream << "TARGET(binary)\n";
-  auto DTC = OpenMPToolChains.first;
-  for (auto  : Inputs) {
-const Action *A = II.getAction();
-// Is this a device linking action?
-if (A && isa(A) &&
-A->isDeviceOffloading(Action::OFK_OpenMP)) {
-  assert(DTC != OpenMPToolChains.second &&
- "More device inputs than device toolchains??");
-  InputBinaryInfo.push_back(std::make_pair(
-  DTC->second->getTriple().normalize(), II.getFilename()));
-  ++DTC;
-  LksStream << "INPUT(" << II.getFilename() << ")\n";
-}
-  }
-

[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-03-31 Thread Reid Kleckner via Phabricator via cfe-commits
rnk added inline comments.



Comment at: lib/Driver/ToolChains/Cuda.cpp:324
+ "CUDA toolchain not expected for an OpenMP host device.");
+  if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
+if (Output.isFilename()) {

This entire if block constructs a completely unrelated link line from the rest 
of the function. They should be separate functions. In fact, 
`CudaToolChain::buildLinker()` should probably return a different tool 
depending on whether you're using OpenMP or not, and this logic should be in a 
different tool. Maybe NVPTX::Linker should be called NVPTX::FatLink as well.



Comment at: lib/Driver/ToolChains/Cuda.cpp:388-394
+  const char *CopyExec = Args.MakeArgString(getToolChain().GetProgramPath(
+  C.getDriver().IsCLMode() ? "copy" : "cp"));
+  ArgStringList CopyCmdArgs;
+  CopyCmdArgs.push_back(II.getFilename());
+  CopyCmdArgs.push_back(CubinF);
+  C.addCommand(
+  llvm::make_unique(JA, *this, CopyExec, CopyCmdArgs, 
Inputs));

Hahnfeld wrote:
> Copying files around would be something new in Clang that someone more 
> experienced has to weight. @rnk, @hfinkel?
I don't think `copy` is an actual binary on Windows, it's a builtin cmd 
command. At least, that's what where copy suggests. clang-cl can also run on 
Linux, so you probably want to check `#ifdef LLVM_ON_WIN32` for this.

Honestly, I'd prefer it if you instead found a way to get the driver to rename 
the inputs right before executing nvlink. We have cross-platform utilities in 
Support for doing renames. You can probably make some kind of custom Command to 
do this.



Comment at: test/Driver/openmp-offload.c:594
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda 
-save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s

In this case, it would be nicer if you arranged for ptxas to output to a .cubin 
file.



Comment at: test/Driver/openmp-offload.c:601
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" "{{.*}}" 
"{{.*}}-openmp-nvptx64-nvidia-cuda-{{.*}}.cubin"
+
+/// ###

Can you add a test that links two input ptxas .o files into a binary? You can 
use `touch` to make the .o files.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-03-31 Thread Jonas Hahnfeld via Phabricator via cfe-commits
Hahnfeld added a reviewer: rnk.
Hahnfeld added a subscriber: rnk.
Hahnfeld added a comment.

Please format all comments as full sentences.




Comment at: lib/Driver/ToolChains/Cuda.cpp:338
+Args.getAllArgValues(options::OPT_march_EQ);
+assert(GPUArchs.size() == 1 && "Exactly one GPU Arch required for ptxas.");
+const std::string  = GPUArchs[0];

Is this catched somewhere before with a diagnostic?



Comment at: lib/Driver/ToolChains/Cuda.cpp:381
+  StringRef Name = llvm::sys::path::filename(II.getFilename());
+  std::pair Split = Name.rsplit('.');
+  std::string TmpName =

`AddOpenMPLinkerScript` uses `llvm::sys::path::replace_extension` which does 
this magic internally



Comment at: lib/Driver/ToolChains/Cuda.cpp:388-394
+  const char *CopyExec = Args.MakeArgString(getToolChain().GetProgramPath(
+  C.getDriver().IsCLMode() ? "copy" : "cp"));
+  ArgStringList CopyCmdArgs;
+  CopyCmdArgs.push_back(II.getFilename());
+  CopyCmdArgs.push_back(CubinF);
+  C.addCommand(
+  llvm::make_unique(JA, *this, CopyExec, CopyCmdArgs, 
Inputs));

Copying files around would be something new in Clang that someone more 
experienced has to weight. @rnk, @hfinkel?


Repository:
  rL LLVM

https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-03-27 Thread Justin Lebar via Phabricator via cfe-commits
jlebar added a comment.

With your latest set of updates, I am not sure which, if any, patches you need 
me to take another look at.  Unfortunately I don't have a ton of time for CUDA 
stuff these days, so where possible I'd prefer to shunt reviews over to hfinkel 
or chandlerc.  But do let me know.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D29654: [OpenMP] Integrate OpenMP target region cubin into host binary

2017-03-27 Thread Gheorghe-Teodor Bercea via Phabricator via cfe-commits
gtbercea updated this revision to Diff 93192.
gtbercea added a comment.
Herald added a subscriber: rengolin.

Update patch to reflect latest source code changes.


Repository:
  rL LLVM

https://reviews.llvm.org/D29654

Files:
  lib/Driver/ToolChains/CommonArgs.cpp
  lib/Driver/ToolChains/CommonArgs.h
  lib/Driver/ToolChains/Cuda.cpp
  lib/Driver/ToolChains/Gnu.cpp
  test/Driver/openmp-offload.c

Index: test/Driver/openmp-offload.c
===
--- test/Driver/openmp-offload.c
+++ test/Driver/openmp-offload.c
@@ -590,6 +590,17 @@
 
 /// ###
 
+/// Check cubin file generation and usage by nvlink
+// RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
+// RUN:   | FileCheck -check-prefix=CHK-CUBIN %s
+
+// CHK-CUBIN: clang{{.*}}" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: ptxas{{.*}}" "--output-file" "{{.*}}-openmp-nvptx64-nvidia-cuda.o" "{{.*}}-openmp-nvptx64-nvidia-cuda.s"
+// CHK-CUBIN-NEXT: cp{{.*}}-openmp-nvptx64-nvidia-cuda.o" "{{.*}}-openmp-nvptx64-nvidia-cuda-{{.*}}.cubin"
+// CHK-CUBIN-NEXT: nvlink" "-o" "{{.*}}-openmp-nvptx64-nvidia-cuda" "{{.*}}" "{{.*}}-openmp-nvptx64-nvidia-cuda-{{.*}}.cubin"
+
+/// ###
+
 /// Check PTXAS is passed -c flag when offloading to an NVIDIA device using OpenMP.
 // RUN:   %clang -### -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-PTXAS %s
Index: lib/Driver/ToolChains/Gnu.cpp
===
--- lib/Driver/ToolChains/Gnu.cpp
+++ lib/Driver/ToolChains/Gnu.cpp
@@ -203,131 +203,6 @@
   // The types are (hopefully) good enough.
 }
 
-/// Add OpenMP linker script arguments at the end of the argument list so that
-/// the fat binary is built by embedding each of the device images into the
-/// host. The linker script also defines a few symbols required by the code
-/// generation so that the images can be easily retrieved at runtime by the
-/// offloading library. This should be used only in tool chains that support
-/// linker scripts.
-static void AddOpenMPLinkerScript(const ToolChain , Compilation ,
-  const InputInfo ,
-  const InputInfoList ,
-  const ArgList , ArgStringList ,
-  const JobAction ) {
-
-  // If this is not an OpenMP host toolchain, we don't need to do anything.
-  if (!JA.isHostOffloading(Action::OFK_OpenMP))
-return;
-
-  // Create temporary linker script. Keep it if save-temps is enabled.
-  const char *LKS;
-  SmallString<256> Name = llvm::sys::path::filename(Output.getFilename());
-  if (C.getDriver().isSaveTempsEnabled()) {
-llvm::sys::path::replace_extension(Name, "lk");
-LKS = C.getArgs().MakeArgString(Name.c_str());
-  } else {
-llvm::sys::path::replace_extension(Name, "");
-Name = C.getDriver().GetTemporaryPath(Name, "lk");
-LKS = C.addTempFile(C.getArgs().MakeArgString(Name.c_str()));
-  }
-
-  // Add linker script option to the command.
-  CmdArgs.push_back("-T");
-  CmdArgs.push_back(LKS);
-
-  // Create a buffer to write the contents of the linker script.
-  std::string LksBuffer;
-  llvm::raw_string_ostream LksStream(LksBuffer);
-
-  // Get the OpenMP offload tool chains so that we can extract the triple
-  // associated with each device input.
-  auto OpenMPToolChains = C.getOffloadToolChains();
-  assert(OpenMPToolChains.first != OpenMPToolChains.second &&
- "No OpenMP toolchains??");
-
-  // Track the input file name and device triple in order to build the script,
-  // inserting binaries in the designated sections.
-  SmallVector, 8> InputBinaryInfo;
-
-  // Add commands to embed target binaries. We ensure that each section and
-  // image is 16-byte aligned. This is not mandatory, but increases the
-  // likelihood of data to be aligned with a cache block in several main host
-  // machines.
-  LksStream << "/*\n";
-  LksStream << "   OpenMP Offload Linker Script\n";
-  LksStream << " *** Automatically generated by Clang ***\n";
-  LksStream << "*/\n";
-  LksStream << "TARGET(binary)\n";
-  auto DTC = OpenMPToolChains.first;
-  for (auto  : Inputs) {
-const Action *A = II.getAction();
-// Is this a device linking action?
-if (A && isa(A) &&
-A->isDeviceOffloading(Action::OFK_OpenMP)) {
-  assert(DTC != OpenMPToolChains.second &&
- "More device inputs than device toolchains??");
-  InputBinaryInfo.push_back(std::make_pair(
-  DTC->second->getTriple().normalize(), II.getFilename()));
-  ++DTC;
-  LksStream << "INPUT(" << II.getFilename() << ")\n";
-}
-  }
-
-