[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D128090#3649579 , @tra wrote:

> In D128090#3649235 , @jhuber6 wrote:
>
>> Interesting, may be worthwhile to query that if it exists, though AMD does 
>> this with `amdgpu-arch` which has led to problems for me in the past. But 
>> even if it could be wrong it will still spit out an architecture that would 
>> run on at least one local card rather than zero.
>
> We do have existing precedent of `-march=native`, so it may make sense to 
> introduce `--offload-arch=native`, though it may be a bit too ambiguous, 
> considering that there may be more than one likely choice -- multiple GPUs, 
> possibly a mix from different vendors. Perhaps we would need something more 
> specific, like `--offload-arch=native-nvidia`.

That's an interesting idea, we could put that functionality in OpenMP using 
this method as well.

  -Xopenmp-target=nvptx64 -march=native

For CUDA / HIP we could also potentially just make users specify which 
toolchain the native applies to somewhat like `-Xopenmp-target`. Might be a 
more general solution than attaching it to the architecture string.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D128090#3649235 , @jhuber6 wrote:

> Interesting, may be worthwhile to query that if it exists, though AMD does 
> this with `amdgpu-arch` which has led to problems for me in the past. But 
> even if it could be wrong it will still spit out an architecture that would 
> run on at least one local card rather than zero.

We do have existing precedent of `-march=native`, so it may make sense to 
introduce `--offload-arch=native`, though it may be a bit too ambiguous, 
considering that there may be more than one likely chouce -- multiple GPUs, 
possibly a mix from different vendors. Perhaps we would need something more 
specific, like `--offload-arch=native-nvidia`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D128090#3649202 , @tra wrote:

> In D128090#3649125 , @jhuber6 wrote:
>
>> It just defaults to `sm_35` if CUDA isn't present on the system IIRC. 
>> Alternatively we could ship a tool to derive it at compile time.
>
> As it happens, recent CUDA releases ship with `bin/__nvcc_device_query` which 
> prints out the list of SM capabilities of the GPUs it sees.
>
> Even that may not be the right value. E.g. only some of the GPUs on the 
> machine may be intended for compute. It's not that uncommon to have a puny 
> card to drive the display and one or more compute cards we actually want to 
> compile for. There's no point compiling for a GPU variant which will never do 
> any compute.

Interesting, may be worthwhile to query that if it exists, though AMD does this 
with `amdgpu-arch` which has led to problems for me in the past. But even if it 
could be wrong it will still spit out an architecture that would run on at 
least one local card rather than zero.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D128090#3649125 , @jhuber6 wrote:

> It just defaults to `sm_35` if CUDA isn't present on the system IIRC. 
> Alternatively we could ship a tool to derive it at compile time.

As it happens, recent CUDA releases ship with `bin/__nvcc_device_query` which 
prints out the list of SM capabilities of the GPUs it sees.

Even that may not be the right value. E.g. only some of the GPUs on the machine 
may be intended for compute. It's not that uncommon to have a puny card to 
drive the display and one or more compute cards we actually want to compile 
for. There's no point compiling for a GPU variant which will never do any 
compute.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D128090#3649059 , @tra wrote:

> In D128090#3648999 , @jhuber6 wrote:
>
>> Right now there's `CLANG_OPENMP_NVPTX_DEFAULT_ARCH`, which is defined by 
>> CMake to be the architecture of the system used to build clang
>
> That does not make sense to me. Most of the time clang would be built on a 
> machine without a GPU, so I don't understand how one would derive a sensible 
> value for CLANG_OPENMP_NVPTX_DEFAULT_ARCH there.
> The vast majority of users will use official release builds of clang and that 
> has no conceivable way to give a sensible default for any specific user. Any 
> guess would be OK sometimes, but it would be wrong most of the time.

It just defaults to `sm_35` if CUDA isn't present on the system IIRC. 
Alternatively we could ship a tool to derive it at compile time.

> I'm all for providing a sensible default, but there's no such thing when it 
> comes for GPUs. CUDA falls back on the oldest supported GPU architecture 
> which has the only benefit of working for occasional manual tinkering and is 
> being consistently wrong for about everyone and forcing them to specify the 
> actual offload targets relevant to their use case.
> So far it's the least bad and somewhat consistent approach I've seen.

Sometimes people get tricked into thinking it works by the JIT performed on the 
PTX output. There's an argument to be made that we shouldn't support any 
defaults at all, since architectures like AMDGPU provide no such mutual 
compatibility.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

In D128090#3648999 , @jhuber6 wrote:

> Right now there's `CLANG_OPENMP_NVPTX_DEFAULT_ARCH`, which is defined by 
> CMake to be the architecture of the system used to build clang

That does not make sense to me. Most of the time clang would be built on a 
machine without a GPU, so I don't understand how one would derive a sensible 
value for CLANG_OPENMP_NVPTX_DEFAULT_ARCH there.
The vast majority of users will use official release builds of clang and that 
has no conceivable way to give a sensible default for any specific user. Any 
guess would be OK sometimes, but it would be wrong most of the time.

I'm all for providing a sensible default, but there's no such thing when it 
comes for GPUs. CUDA falls back on the oldest supported GPU architecture which 
has the only benefit of working for occasional manual tinkering and is being 
consistently wrong for about everyone and forcing them to specify the actual 
offload targets relevant to their use case.
So far it's the least bad and somewhat consistent approach I've seen.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

In D128090#3648984 , @tra wrote:

> At some point we should start consolidating the ways we can specify an 
> offload target and try to avoid adding new ones until then.

Agreed, that was my intention with making `--offload-arch` work for everything 
the same way in the new driver. The difference with CUDA / HIP and OpenMP right 
now is  the default behavior if nothing was given. For CUDA / HIP we just 
default the bound architecture to something like sm_35 and gfx803 I believe. 
For OpenMP we keep the bound architecture empty which signals us to check the 
value of `-march=` and use that if present, or default to something more 
intelligent. Right now there's `CLANG_OPENMP_NVPTX_DEFAULT_ARCH`, which is 
defined by CMake to be the architecture of the system used to build clang, and 
`amdgpu-arch` which is just a program that runs at compile time. I'm not sure 
if there would be a desire to make CUDA / HIP adhere to this as well with the 
new driver.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Artem Belevich via Phabricator via cfe-commits
tra added a comment.

> Overloading the meaning of `-march` might not work here. Typically `-march` 
> is just checked via `Args.getLastArg()`, so repeated uses just override the 
> last one. I'm not exactly sure what the expected use is however, maybe @tra 
> can help there. Although we could consider ones contained inside of 
> `-Xopenmp-target=` to be different. That being said, if we wanted to support 
> this I think the easiest way to do it would be to add handling for `-march` 
> in the source here 
> .

`-march` is just not flexible enough. E.g. AMDGPU has GPUs that have the same 
-march, but are considered to be different offloading targets.

At some point we should start consolidating the ways we can specify an offload 
target and try to avoid adding new ones until then.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a subscriber: tra.
jhuber6 added a comment.

In D128090#3648879 , @saiislam wrote:

> `-Xopenmp-target -march ` used to be the only option to target a specific sub 
> arch before `--offload-arch`. But, it doesn't support multiple archs.
> This patch relies on infra used by `--offload-arch` to support this verbose 
> method of specifying multiple archs.
>
> Use case: people already familiar with `-Xopenmp-target -march` option are 
> likely to use the same for multiple archs, until they learn about shorthand 
> representation, `--offload-arch`.

Overloading the meaning of `-march` might not work here. Typically `-march` is 
just checked via `Args.getLastArg()`, so repeated uses just override the last 
one. I'm not exactly sure what the expected use is however, maybe @tra can help 
there. Although we could consider ones contained inside of `-Xopenmp-target=` 
to be different. That being said, if we wanted to support this I think the 
easiest way to do it would be to add handling for `-march` in the source here 
.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Saiyedul Islam via Phabricator via cfe-commits
saiislam added a comment.

In D128090#3648210 , @jhuber6 wrote:

> Sorry never noticed this revision. The purpose of this patch seems to be 
> supporting something like this
>
>   clang input.c -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 
> -march=sm_70 -Xopenmp-target=nvptx64 -march=sm_80
>
> Right now the above works if you replace `-march=` with `--offload-arch=`. 
> Currently the offloading tools use a "bound" architecture to tie a specific 
> architecture with a job, which is what allows us to offload to multiple 
> architectures. If there is no bound architecture gives, we instead use the 
> `-march=` option, and if that is not present we derive it. It would be 
> possible to set the bound architecture via `-march` if we wanted to. But I'm 
> not sure if it's necessary given that it would just be an alternate syntax 
> for `--offload-arch=`.

`-Xopenmp-target -march ` used to be the only option to target a specific sub 
arch before `--offload-arch`. But, it doesn't support multiple archs.
This patch relies on infra used by `--offload-arch` to support this verbose 
method of specifying multiple archs.

Use case: people already familiar with `-Xopenmp-target -march` option are 
likely to use the same for multiple archs, until they learn about shorthand 
representation, `--offload-arch`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-07-13 Thread Joseph Huber via Phabricator via cfe-commits
jhuber6 added a comment.

Sorry never noticed this revision. The purpose of this patch seems to be 
supporting something like this

  clang input.c -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 
-march=sm_70 -Xopenmp-target=nvptx64 -march=sm_80

Right now the above works if you replace `-march=` with `--offload-arch=`. 
Currently the offloading tools use a "bound" architecture to tie a specific 
architecture with a job, which is what allows us to offload to multiple 
architectures. If there is no bound architecture gives, we instead use the 
`-march=` option, and if that is not present we derive it. It would be possible 
to set the bound architecture via `-march` if we wanted to. But I'm not sure if 
it's necessary given that it would just be an alternate syntax for 
`--offload-arch=`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-06-17 Thread Saiyedul Islam via Phabricator via cfe-commits
saiislam updated this revision to Diff 438004.
saiislam added a comment.

clang-formatted.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128090/new/

https://reviews.llvm.org/D128090

Files:
  clang/include/clang/Driver/Driver.h
  clang/lib/Driver/Driver.cpp


Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -722,6 +722,38 @@
   return RT;
 }
 
+bool Driver::GetTargetInfoFromMArch(
+Compilation &C, llvm::StringMap> &DerivedArchs) {
+  StringRef OpenMPTargetArch;
+  for (Arg *A : C.getInputArgs()) {
+if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+  StringRef OpenMPTargetTriple = StringRef(A->getValue(0));
+  llvm::Triple TargetTriple(OpenMPTargetTriple);
+
+  for (auto *V : A->getValues()) {
+StringRef VStr = StringRef(V);
+if (VStr.startswith("-march=") || VStr.startswith("--march=")) {
+  OpenMPTargetArch = VStr.split('=').second;
+  CudaArch Arch = StringToCudaArch(StringRef(OpenMPTargetArch));
+  if (Arch == CudaArch::UNKNOWN) {
+C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch)
+<< OpenMPTargetArch;
+C.setContainsError();
+return false;
+  }
+
+  if (!OpenMPTargetTriple.empty() && !OpenMPTargetArch.empty()) {
+DerivedArchs[OpenMPTargetTriple].insert(OpenMPTargetArch);
+  }
+}
+A->claim();
+  }
+}
+  }
+
+  return true;
+}
+
 void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
   InputList &Inputs) {
 
@@ -812,6 +844,10 @@
 << OpenMPTargets->getAsString(C.getInputArgs());
 return;
   }
+  // Process legacy option -fopenmp-targets -Xopenmp-target and -march
+  auto status = GetTargetInfoFromMArch(C, DerivedArchs);
+  if (!status)
+return;
   llvm::copy(OpenMPTargets->getValues(), 
std::back_inserter(OpenMPTriples));
 } else if (C.getInputArgs().hasArg(options::OPT_offload_arch_EQ) &&
!IsHIP && !IsCuda) {
Index: clang/include/clang/Driver/Driver.h
===
--- clang/include/clang/Driver/Driver.h
+++ clang/include/clang/Driver/Driver.h
@@ -412,6 +412,11 @@
   /// current compilation. Also, update the host tool chain kind accordingly.
   void CreateOffloadingDeviceToolChains(Compilation &C, InputList &Inputs);
 
+  /// GetTargetInfoFromMArch - extract sub-architecture from -march flag used
+  /// with -fopenmp-targets and -Xopenmp-target options.
+  bool GetTargetInfoFromMArch(
+  Compilation &C, llvm::StringMap> 
&DerivedArchs);
+
   /// BuildCompilation - Construct a compilation object for a command
   /// line argument vector.
   ///


Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -722,6 +722,38 @@
   return RT;
 }
 
+bool Driver::GetTargetInfoFromMArch(
+Compilation &C, llvm::StringMap> &DerivedArchs) {
+  StringRef OpenMPTargetArch;
+  for (Arg *A : C.getInputArgs()) {
+if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+  StringRef OpenMPTargetTriple = StringRef(A->getValue(0));
+  llvm::Triple TargetTriple(OpenMPTargetTriple);
+
+  for (auto *V : A->getValues()) {
+StringRef VStr = StringRef(V);
+if (VStr.startswith("-march=") || VStr.startswith("--march=")) {
+  OpenMPTargetArch = VStr.split('=').second;
+  CudaArch Arch = StringToCudaArch(StringRef(OpenMPTargetArch));
+  if (Arch == CudaArch::UNKNOWN) {
+C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch)
+<< OpenMPTargetArch;
+C.setContainsError();
+return false;
+  }
+
+  if (!OpenMPTargetTriple.empty() && !OpenMPTargetArch.empty()) {
+DerivedArchs[OpenMPTargetTriple].insert(OpenMPTargetArch);
+  }
+}
+A->claim();
+  }
+}
+  }
+
+  return true;
+}
+
 void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
   InputList &Inputs) {
 
@@ -812,6 +844,10 @@
 << OpenMPTargets->getAsString(C.getInputArgs());
 return;
   }
+  // Process legacy option -fopenmp-targets -Xopenmp-target and -march
+  auto status = GetTargetInfoFromMArch(C, DerivedArchs);
+  if (!status)
+return;
   llvm::copy(OpenMPTargets->getValues(), std::back_inserter(OpenMPTriples));
 } else if (C.getInputArgs().hasArg(options::OPT_offload_arch_EQ) &&
!IsHIP && !IsCuda) {
Index: clang/include/clang/Driver/Driver.h
===
--- clang/include/clang/Driver/Driver.h
++

[PATCH] D128090: [Clang][OpenMP] Process multi-arch compilation options given via -march

2022-06-17 Thread Saiyedul Islam via Phabricator via cfe-commits
saiislam created this revision.
saiislam added reviewers: jdoerfert, JonChesterfield, jhuber6, yaxunl.
Herald added a subscriber: guansong.
Herald added a project: All.
saiislam requested review of this revision.
Herald added subscribers: cfe-commits, sstefan1, MaskRay.
Herald added a project: clang.

Subarchitectures for multi-file compilation specified using -fopenmp-targets,
-Xopenmp-target, and -march were not getting added to the
 map `KnownArchs`.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D128090

Files:
  clang/include/clang/Driver/Driver.h
  clang/lib/Driver/Driver.cpp


Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -722,6 +722,37 @@
   return RT;
 }
 
+bool Driver::GetTargetInfoFromMArch(Compilation &C, 
llvm::StringMap> &DerivedArchs) {
+  StringRef OpenMPTargetArch;
+  for (Arg *A : C.getInputArgs()) {
+if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+  StringRef OpenMPTargetTriple = StringRef(A->getValue(0));
+  llvm::Triple TargetTriple(OpenMPTargetTriple);
+
+  for (auto *V : A->getValues()) {
+StringRef VStr = StringRef(V);
+if (VStr.startswith("-march=") || VStr.startswith("--march=")) {
+  OpenMPTargetArch = VStr.split('=').second;
+  CudaArch Arch = StringToCudaArch(StringRef(OpenMPTargetArch));
+  if (Arch == CudaArch::UNKNOWN) {
+C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch)
+<< OpenMPTargetArch;
+C.setContainsError();
+return false;
+  }
+
+  if (!OpenMPTargetTriple.empty() && !OpenMPTargetArch.empty()) {
+DerivedArchs[OpenMPTargetTriple].insert(OpenMPTargetArch);
+  }
+}
+A->claim();
+  }
+}
+  }
+
+  return true;
+}
+
 void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
   InputList &Inputs) {
 
@@ -812,6 +843,10 @@
 << OpenMPTargets->getAsString(C.getInputArgs());
 return;
   }
+  // Process legacy option -fopenmp-targets -Xopenmp-target and -march
+  auto status = GetTargetInfoFromMArch(C, DerivedArchs);
+  if (!status)
+return;
   llvm::copy(OpenMPTargets->getValues(), 
std::back_inserter(OpenMPTriples));
 } else if (C.getInputArgs().hasArg(options::OPT_offload_arch_EQ) &&
!IsHIP && !IsCuda) {
Index: clang/include/clang/Driver/Driver.h
===
--- clang/include/clang/Driver/Driver.h
+++ clang/include/clang/Driver/Driver.h
@@ -412,6 +412,10 @@
   /// current compilation. Also, update the host tool chain kind accordingly.
   void CreateOffloadingDeviceToolChains(Compilation &C, InputList &Inputs);
 
+  /// GetTargetInfoFromMArch - extract sub-architecture from -march flag used
+  /// with -fopenmp-targets and -Xopenmp-target options.
+  bool GetTargetInfoFromMArch(Compilation &C, 
llvm::StringMap> &DerivedArchs);
+
   /// BuildCompilation - Construct a compilation object for a command
   /// line argument vector.
   ///


Index: clang/lib/Driver/Driver.cpp
===
--- clang/lib/Driver/Driver.cpp
+++ clang/lib/Driver/Driver.cpp
@@ -722,6 +722,37 @@
   return RT;
 }
 
+bool Driver::GetTargetInfoFromMArch(Compilation &C, llvm::StringMap> &DerivedArchs) {
+  StringRef OpenMPTargetArch;
+  for (Arg *A : C.getInputArgs()) {
+if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
+  StringRef OpenMPTargetTriple = StringRef(A->getValue(0));
+  llvm::Triple TargetTriple(OpenMPTargetTriple);
+
+  for (auto *V : A->getValues()) {
+StringRef VStr = StringRef(V);
+if (VStr.startswith("-march=") || VStr.startswith("--march=")) {
+  OpenMPTargetArch = VStr.split('=').second;
+  CudaArch Arch = StringToCudaArch(StringRef(OpenMPTargetArch));
+  if (Arch == CudaArch::UNKNOWN) {
+C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch)
+<< OpenMPTargetArch;
+C.setContainsError();
+return false;
+  }
+
+  if (!OpenMPTargetTriple.empty() && !OpenMPTargetArch.empty()) {
+DerivedArchs[OpenMPTargetTriple].insert(OpenMPTargetArch);
+  }
+}
+A->claim();
+  }
+}
+  }
+
+  return true;
+}
+
 void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
   InputList &Inputs) {
 
@@ -812,6 +843,10 @@
 << OpenMPTargets->getAsString(C.getInputArgs());
 return;
   }
+  // Process legacy option -fopenmp-targets -Xopenmp-target and -march
+  auto status = GetTargetInfoFromMArch(C, DerivedArchs);
+  if (!status)
+return;
   llvm::copy(OpenMPTargets->getValues(), std::back_inserter