[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-05-05 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2738859 , @estewart08 wrote:

> In D99432#2736981 , @ABataev wrote:
>
>> In D99432#2736970 , @estewart08 
>> wrote:
>>
>>> In D99432#2728788 , @ABataev wrote:
>>>
 In D99432#2726997 , @estewart08 
 wrote:

> In D99432#2726845 , @ABataev 
> wrote:
>
>> In D99432#2726588 , @estewart08 
>> wrote:
>>
>>> In D99432#2726391 , @ABataev 
>>> wrote:
>>>
 In D99432#2726337 , 
 @estewart08 wrote:

> In D99432#2726060 , @ABataev 
> wrote:
>
>> In D99432#2726050 , 
>> @estewart08 wrote:
>>
>>> In D99432#2726025 , 
>>> @ABataev wrote:
>>>
 In D99432#2726019 , 
 @estewart08 wrote:

> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do 
> not see how this helps SPMD mode with team privatization of 
> declarations in-between target teams and parallel regions.

 Diв you try the reproducer with the applied patch?
>>>
>>> Yes, I still saw the test fail, although it was not with latest 
>>> llvm-project. Are you saying the reproducer passes for you?
>>
>> I don't have CUDA installed but from what I see in the LLVM IR it 
>> shall pass. Do you have a debug log, does it crashes or produces 
>> incorrect results?
>
> This is on an AMDGPU but I assume the behavior would be similar for 
> NVPTX.
>
> It produces incorrect/incomplete results in the dist[0] index after a 
> manual reduction and in turn the final global gpu_results array is 
> incorrect.
> When thread 0 does a reduction into dist[0] it has no knowledge of 
> dist[1] having been updated by thread 1. Which tells me the array is 
> still thread private.
> Adding some printfs, looking at one teams' output:
>
> SPMD
>
>   Thread 0: dist[0]: 1
>   Thread 0: dist[1]: 0  // This should be 1
>   After reduction into dist[0]: 1  // This should be 2
>   gpu_results = [1,1]  // [2,2] expected
>
> Generic Mode:
>
>   Thread 0: dist[0]: 1
>   Thread 0: dist[1]: 1   
>   After reduction into dist[0]: 2
>   gpu_results = [2,2]

 Hmm, I would expect a crash if the array was allocated in the local 
 memory. Could you try to add some more printfs (with data and 
 addresses of the array) to check the results? Maybe there is a data 
 race somewhere in the code?
>>>
>>> As a reminder, each thread updates a unique index in the dist array and 
>>> each team updates a unique index in gpu_results.
>>>
>>> SPMD - shows each thread has a unique address for dist array
>>>
>>>   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
>>>   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
>>>   
>>>   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
>>>   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
>>>   
>>>   Team 0 Thread 0: After reduction into dist[0]: 1
>>>   Team 0 Thread 0: gpu_results address: 0x7f92a500
>>>   --
>>>   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
>>>   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
>>>   
>>>   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
>>>   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
>>>   
>>>   Team 1 Thread 0: After reduction into dist[0]: 1
>>>   Team 1 Thread 0: gpu_results address: 0x7f92a500
>>>   
>>>   gpu_results[0]: 1
>>>   gpu_results[1]: 1
>>>
>>> Generic - shows each team shares dist array address amongst threads
>>>
>>>   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
>>>   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
>>>   
>>>   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
>>>   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
>>>   
>>>   Team 0 Thread 0: After reduction into dist[0]: 2
>>>   Team 0 Thread 0: gpu_results address: 0x7fabc500
>>>   --
>>>   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
>>>   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
>>>   
>>>   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10

[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-05-05 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2736981 , @ABataev wrote:

> In D99432#2736970 , @estewart08 
> wrote:
>
>> In D99432#2728788 , @ABataev wrote:
>>
>>> In D99432#2726997 , @estewart08 
>>> wrote:
>>>
 In D99432#2726845 , @ABataev 
 wrote:

> In D99432#2726588 , @estewart08 
> wrote:
>
>> In D99432#2726391 , @ABataev 
>> wrote:
>>
>>> In D99432#2726337 , 
>>> @estewart08 wrote:
>>>
 In D99432#2726060 , @ABataev 
 wrote:

> In D99432#2726050 , 
> @estewart08 wrote:
>
>> In D99432#2726025 , 
>> @ABataev wrote:
>>
>>> In D99432#2726019 , 
>>> @estewart08 wrote:
>>>
 In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do 
 not see how this helps SPMD mode with team privatization of 
 declarations in-between target teams and parallel regions.
>>>
>>> Diв you try the reproducer with the applied patch?
>>
>> Yes, I still saw the test fail, although it was not with latest 
>> llvm-project. Are you saying the reproducer passes for you?
>
> I don't have CUDA installed but from what I see in the LLVM IR it 
> shall pass. Do you have a debug log, does it crashes or produces 
> incorrect results?

 This is on an AMDGPU but I assume the behavior would be similar for 
 NVPTX.

 It produces incorrect/incomplete results in the dist[0] index after a 
 manual reduction and in turn the final global gpu_results array is 
 incorrect.
 When thread 0 does a reduction into dist[0] it has no knowledge of 
 dist[1] having been updated by thread 1. Which tells me the array is 
 still thread private.
 Adding some printfs, looking at one teams' output:

 SPMD

   Thread 0: dist[0]: 1
   Thread 0: dist[1]: 0  // This should be 1
   After reduction into dist[0]: 1  // This should be 2
   gpu_results = [1,1]  // [2,2] expected

 Generic Mode:

   Thread 0: dist[0]: 1
   Thread 0: dist[1]: 1   
   After reduction into dist[0]: 2
   gpu_results = [2,2]
>>>
>>> Hmm, I would expect a crash if the array was allocated in the local 
>>> memory. Could you try to add some more printfs (with data and addresses 
>>> of the array) to check the results? Maybe there is a data race 
>>> somewhere in the code?
>>
>> As a reminder, each thread updates a unique index in the dist array and 
>> each team updates a unique index in gpu_results.
>>
>> SPMD - shows each thread has a unique address for dist array
>>
>>   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
>>   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
>>   
>>   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
>>   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
>>   
>>   Team 0 Thread 0: After reduction into dist[0]: 1
>>   Team 0 Thread 0: gpu_results address: 0x7f92a500
>>   --
>>   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
>>   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
>>   
>>   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
>>   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
>>   
>>   Team 1 Thread 0: After reduction into dist[0]: 1
>>   Team 1 Thread 0: gpu_results address: 0x7f92a500
>>   
>>   gpu_results[0]: 1
>>   gpu_results[1]: 1
>>
>> Generic - shows each team shares dist array address amongst threads
>>
>>   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
>>   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
>>   
>>   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
>>   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
>>   
>>   Team 0 Thread 0: After reduction into dist[0]: 2
>>   Team 0 Thread 0: gpu_results address: 0x7fabc500
>>   --
>>   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
>>   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
>>   
>>   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
>>   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
>>   
>>   Team 1 Thread 0: After reduction into dist[0]: 2
>>   Team 1 Thread 0: gpu_results address: 0x7fabc500
>
> 

[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-05-04 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2736970 , @estewart08 wrote:

> In D99432#2728788 , @ABataev wrote:
>
>> In D99432#2726997 , @estewart08 
>> wrote:
>>
>>> In D99432#2726845 , @ABataev wrote:
>>>
 In D99432#2726588 , @estewart08 
 wrote:

> In D99432#2726391 , @ABataev 
> wrote:
>
>> In D99432#2726337 , @estewart08 
>> wrote:
>>
>>> In D99432#2726060 , @ABataev 
>>> wrote:
>>>
 In D99432#2726050 , 
 @estewart08 wrote:

> In D99432#2726025 , @ABataev 
> wrote:
>
>> In D99432#2726019 , 
>> @estewart08 wrote:
>>
>>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do 
>>> not see how this helps SPMD mode with team privatization of 
>>> declarations in-between target teams and parallel regions.
>>
>> Diв you try the reproducer with the applied patch?
>
> Yes, I still saw the test fail, although it was not with latest 
> llvm-project. Are you saying the reproducer passes for you?

 I don't have CUDA installed but from what I see in the LLVM IR it 
 shall pass. Do you have a debug log, does it crashes or produces 
 incorrect results?
>>>
>>> This is on an AMDGPU but I assume the behavior would be similar for 
>>> NVPTX.
>>>
>>> It produces incorrect/incomplete results in the dist[0] index after a 
>>> manual reduction and in turn the final global gpu_results array is 
>>> incorrect.
>>> When thread 0 does a reduction into dist[0] it has no knowledge of 
>>> dist[1] having been updated by thread 1. Which tells me the array is 
>>> still thread private.
>>> Adding some printfs, looking at one teams' output:
>>>
>>> SPMD
>>>
>>>   Thread 0: dist[0]: 1
>>>   Thread 0: dist[1]: 0  // This should be 1
>>>   After reduction into dist[0]: 1  // This should be 2
>>>   gpu_results = [1,1]  // [2,2] expected
>>>
>>> Generic Mode:
>>>
>>>   Thread 0: dist[0]: 1
>>>   Thread 0: dist[1]: 1   
>>>   After reduction into dist[0]: 2
>>>   gpu_results = [2,2]
>>
>> Hmm, I would expect a crash if the array was allocated in the local 
>> memory. Could you try to add some more printfs (with data and addresses 
>> of the array) to check the results? Maybe there is a data race somewhere 
>> in the code?
>
> As a reminder, each thread updates a unique index in the dist array and 
> each team updates a unique index in gpu_results.
>
> SPMD - shows each thread has a unique address for dist array
>
>   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
>   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
>   
>   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
>   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
>   
>   Team 0 Thread 0: After reduction into dist[0]: 1
>   Team 0 Thread 0: gpu_results address: 0x7f92a500
>   --
>   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
>   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
>   
>   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
>   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
>   
>   Team 1 Thread 0: After reduction into dist[0]: 1
>   Team 1 Thread 0: gpu_results address: 0x7f92a500
>   
>   gpu_results[0]: 1
>   gpu_results[1]: 1
>
> Generic - shows each team shares dist array address amongst threads
>
>   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
>   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
>   
>   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
>   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
>   
>   Team 0 Thread 0: After reduction into dist[0]: 2
>   Team 0 Thread 0: gpu_results address: 0x7fabc500
>   --
>   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
>   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
>   
>   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
>   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
>   
>   Team 1 Thread 0: After reduction into dist[0]: 2
>   Team 1 Thread 0: gpu_results address: 0x7fabc500

 Could you check if it works with 
 `-fno-openmp-cuda-parallel-target-regions` option?
>>>
>>> Unfortunately that crashes:
>>> llvm-project/llvm/lib/IR/Instructions.cpp:495: void 
>>> 

[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-05-04 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2728788 , @ABataev wrote:

> In D99432#2726997 , @estewart08 
> wrote:
>
>> In D99432#2726845 , @ABataev wrote:
>>
>>> In D99432#2726588 , @estewart08 
>>> wrote:
>>>
 In D99432#2726391 , @ABataev 
 wrote:

> In D99432#2726337 , @estewart08 
> wrote:
>
>> In D99432#2726060 , @ABataev 
>> wrote:
>>
>>> In D99432#2726050 , 
>>> @estewart08 wrote:
>>>
 In D99432#2726025 , @ABataev 
 wrote:

> In D99432#2726019 , 
> @estewart08 wrote:
>
>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do 
>> not see how this helps SPMD mode with team privatization of 
>> declarations in-between target teams and parallel regions.
>
> Diв you try the reproducer with the applied patch?

 Yes, I still saw the test fail, although it was not with latest 
 llvm-project. Are you saying the reproducer passes for you?
>>>
>>> I don't have CUDA installed but from what I see in the LLVM IR it shall 
>>> pass. Do you have a debug log, does it crashes or produces incorrect 
>>> results?
>>
>> This is on an AMDGPU but I assume the behavior would be similar for 
>> NVPTX.
>>
>> It produces incorrect/incomplete results in the dist[0] index after a 
>> manual reduction and in turn the final global gpu_results array is 
>> incorrect.
>> When thread 0 does a reduction into dist[0] it has no knowledge of 
>> dist[1] having been updated by thread 1. Which tells me the array is 
>> still thread private.
>> Adding some printfs, looking at one teams' output:
>>
>> SPMD
>>
>>   Thread 0: dist[0]: 1
>>   Thread 0: dist[1]: 0  // This should be 1
>>   After reduction into dist[0]: 1  // This should be 2
>>   gpu_results = [1,1]  // [2,2] expected
>>
>> Generic Mode:
>>
>>   Thread 0: dist[0]: 1
>>   Thread 0: dist[1]: 1   
>>   After reduction into dist[0]: 2
>>   gpu_results = [2,2]
>
> Hmm, I would expect a crash if the array was allocated in the local 
> memory. Could you try to add some more printfs (with data and addresses 
> of the array) to check the results? Maybe there is a data race somewhere 
> in the code?

 As a reminder, each thread updates a unique index in the dist array and 
 each team updates a unique index in gpu_results.

 SPMD - shows each thread has a unique address for dist array

   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
   
   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
   
   Team 0 Thread 0: After reduction into dist[0]: 1
   Team 0 Thread 0: gpu_results address: 0x7f92a500
   --
   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
   
   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
   
   Team 1 Thread 0: After reduction into dist[0]: 1
   Team 1 Thread 0: gpu_results address: 0x7f92a500
   
   gpu_results[0]: 1
   gpu_results[1]: 1

 Generic - shows each team shares dist array address amongst threads

   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
   
   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
   
   Team 0 Thread 0: After reduction into dist[0]: 2
   Team 0 Thread 0: gpu_results address: 0x7fabc500
   --
   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
   
   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
   
   Team 1 Thread 0: After reduction into dist[0]: 2
   Team 1 Thread 0: gpu_results address: 0x7fabc500
>>>
>>> Could you check if it works with `-fno-openmp-cuda-parallel-target-regions` 
>>> option?
>>
>> Unfortunately that crashes:
>> llvm-project/llvm/lib/IR/Instructions.cpp:495: void 
>> llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, 
>> llvm::ArrayRef, 
>> llvm::ArrayRef >, const llvm::Twine&): 
>> Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == 
>> Args[i]->getType()) && "Calling a 

[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-30 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2726997 , @estewart08 wrote:

> In D99432#2726845 , @ABataev wrote:
>
>> In D99432#2726588 , @estewart08 
>> wrote:
>>
>>> In D99432#2726391 , @ABataev wrote:
>>>
 In D99432#2726337 , @estewart08 
 wrote:

> In D99432#2726060 , @ABataev 
> wrote:
>
>> In D99432#2726050 , @estewart08 
>> wrote:
>>
>>> In D99432#2726025 , @ABataev 
>>> wrote:
>>>
 In D99432#2726019 , 
 @estewart08 wrote:

> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not 
> see how this helps SPMD mode with team privatization of declarations 
> in-between target teams and parallel regions.

 Diв you try the reproducer with the applied patch?
>>>
>>> Yes, I still saw the test fail, although it was not with latest 
>>> llvm-project. Are you saying the reproducer passes for you?
>>
>> I don't have CUDA installed but from what I see in the LLVM IR it shall 
>> pass. Do you have a debug log, does it crashes or produces incorrect 
>> results?
>
> This is on an AMDGPU but I assume the behavior would be similar for NVPTX.
>
> It produces incorrect/incomplete results in the dist[0] index after a 
> manual reduction and in turn the final global gpu_results array is 
> incorrect.
> When thread 0 does a reduction into dist[0] it has no knowledge of 
> dist[1] having been updated by thread 1. Which tells me the array is 
> still thread private.
> Adding some printfs, looking at one teams' output:
>
> SPMD
>
>   Thread 0: dist[0]: 1
>   Thread 0: dist[1]: 0  // This should be 1
>   After reduction into dist[0]: 1  // This should be 2
>   gpu_results = [1,1]  // [2,2] expected
>
> Generic Mode:
>
>   Thread 0: dist[0]: 1
>   Thread 0: dist[1]: 1   
>   After reduction into dist[0]: 2
>   gpu_results = [2,2]

 Hmm, I would expect a crash if the array was allocated in the local 
 memory. Could you try to add some more printfs (with data and addresses of 
 the array) to check the results? Maybe there is a data race somewhere in 
 the code?
>>>
>>> As a reminder, each thread updates a unique index in the dist array and 
>>> each team updates a unique index in gpu_results.
>>>
>>> SPMD - shows each thread has a unique address for dist array
>>>
>>>   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
>>>   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
>>>   
>>>   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
>>>   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
>>>   
>>>   Team 0 Thread 0: After reduction into dist[0]: 1
>>>   Team 0 Thread 0: gpu_results address: 0x7f92a500
>>>   --
>>>   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
>>>   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
>>>   
>>>   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
>>>   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
>>>   
>>>   Team 1 Thread 0: After reduction into dist[0]: 1
>>>   Team 1 Thread 0: gpu_results address: 0x7f92a500
>>>   
>>>   gpu_results[0]: 1
>>>   gpu_results[1]: 1
>>>
>>> Generic - shows each team shares dist array address amongst threads
>>>
>>>   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
>>>   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
>>>   
>>>   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
>>>   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
>>>   
>>>   Team 0 Thread 0: After reduction into dist[0]: 2
>>>   Team 0 Thread 0: gpu_results address: 0x7fabc500
>>>   --
>>>   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
>>>   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
>>>   
>>>   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
>>>   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
>>>   
>>>   Team 1 Thread 0: After reduction into dist[0]: 2
>>>   Team 1 Thread 0: gpu_results address: 0x7fabc500
>>
>> Could you check if it works with `-fno-openmp-cuda-parallel-target-regions` 
>> option?
>
> Unfortunately that crashes:
> llvm-project/llvm/lib/IR/Instructions.cpp:495: void 
> llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, 
> llvm::ArrayRef, 
> llvm::ArrayRef >, const llvm::Twine&): 
> Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == 
> Args[i]->getType()) && "Calling a function with a bad signature!"' failed.

Hmm, could you provide a full stack trace?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2726845 , @ABataev wrote:

> In D99432#2726588 , @estewart08 
> wrote:
>
>> In D99432#2726391 , @ABataev wrote:
>>
>>> In D99432#2726337 , @estewart08 
>>> wrote:
>>>
 In D99432#2726060 , @ABataev 
 wrote:

> In D99432#2726050 , @estewart08 
> wrote:
>
>> In D99432#2726025 , @ABataev 
>> wrote:
>>
>>> In D99432#2726019 , 
>>> @estewart08 wrote:
>>>
 In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not 
 see how this helps SPMD mode with team privatization of declarations 
 in-between target teams and parallel regions.
>>>
>>> Diв you try the reproducer with the applied patch?
>>
>> Yes, I still saw the test fail, although it was not with latest 
>> llvm-project. Are you saying the reproducer passes for you?
>
> I don't have CUDA installed but from what I see in the LLVM IR it shall 
> pass. Do you have a debug log, does it crashes or produces incorrect 
> results?

 This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

 It produces incorrect/incomplete results in the dist[0] index after a 
 manual reduction and in turn the final global gpu_results array is 
 incorrect.
 When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] 
 having been updated by thread 1. Which tells me the array is still thread 
 private.
 Adding some printfs, looking at one teams' output:

 SPMD

   Thread 0: dist[0]: 1
   Thread 0: dist[1]: 0  // This should be 1
   After reduction into dist[0]: 1  // This should be 2
   gpu_results = [1,1]  // [2,2] expected

 Generic Mode:

   Thread 0: dist[0]: 1
   Thread 0: dist[1]: 1   
   After reduction into dist[0]: 2
   gpu_results = [2,2]
>>>
>>> Hmm, I would expect a crash if the array was allocated in the local memory. 
>>> Could you try to add some more printfs (with data and addresses of the 
>>> array) to check the results? Maybe there is a data race somewhere in the 
>>> code?
>>
>> As a reminder, each thread updates a unique index in the dist array and each 
>> team updates a unique index in gpu_results.
>>
>> SPMD - shows each thread has a unique address for dist array
>>
>>   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
>>   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
>>   
>>   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
>>   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
>>   
>>   Team 0 Thread 0: After reduction into dist[0]: 1
>>   Team 0 Thread 0: gpu_results address: 0x7f92a500
>>   --
>>   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
>>   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
>>   
>>   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
>>   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
>>   
>>   Team 1 Thread 0: After reduction into dist[0]: 1
>>   Team 1 Thread 0: gpu_results address: 0x7f92a500
>>   
>>   gpu_results[0]: 1
>>   gpu_results[1]: 1
>>
>> Generic - shows each team shares dist array address amongst threads
>>
>>   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
>>   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
>>   
>>   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
>>   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
>>   
>>   Team 0 Thread 0: After reduction into dist[0]: 2
>>   Team 0 Thread 0: gpu_results address: 0x7fabc500
>>   --
>>   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
>>   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
>>   
>>   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
>>   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
>>   
>>   Team 1 Thread 0: After reduction into dist[0]: 2
>>   Team 1 Thread 0: gpu_results address: 0x7fabc500
>
> Could you check if it works with `-fno-openmp-cuda-parallel-target-regions` 
> option?

Unfortunately that crashes:
llvm-project/llvm/lib/IR/Instructions.cpp:495: void 
llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, 
llvm::ArrayRef, 
llvm::ArrayRef >, const llvm::Twine&): 
Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == 
Args[i]->getType()) && "Calling a function with a bad signature!"' failed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2726588 , @estewart08 wrote:

> In D99432#2726391 , @ABataev wrote:
>
>> In D99432#2726337 , @estewart08 
>> wrote:
>>
>>> In D99432#2726060 , @ABataev wrote:
>>>
 In D99432#2726050 , @estewart08 
 wrote:

> In D99432#2726025 , @ABataev 
> wrote:
>
>> In D99432#2726019 , @estewart08 
>> wrote:
>>
>>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not 
>>> see how this helps SPMD mode with team privatization of declarations 
>>> in-between target teams and parallel regions.
>>
>> Diв you try the reproducer with the applied patch?
>
> Yes, I still saw the test fail, although it was not with latest 
> llvm-project. Are you saying the reproducer passes for you?

 I don't have CUDA installed but from what I see in the LLVM IR it shall 
 pass. Do you have a debug log, does it crashes or produces incorrect 
 results?
>>>
>>> This is on an AMDGPU but I assume the behavior would be similar for NVPTX.
>>>
>>> It produces incorrect/incomplete results in the dist[0] index after a 
>>> manual reduction and in turn the final global gpu_results array is 
>>> incorrect.
>>> When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] 
>>> having been updated by thread 1. Which tells me the array is still thread 
>>> private.
>>> Adding some printfs, looking at one teams' output:
>>>
>>> SPMD
>>>
>>>   Thread 0: dist[0]: 1
>>>   Thread 0: dist[1]: 0  // This should be 1
>>>   After reduction into dist[0]: 1  // This should be 2
>>>   gpu_results = [1,1]  // [2,2] expected
>>>
>>> Generic Mode:
>>>
>>>   Thread 0: dist[0]: 1
>>>   Thread 0: dist[1]: 1   
>>>   After reduction into dist[0]: 2
>>>   gpu_results = [2,2]
>>
>> Hmm, I would expect a crash if the array was allocated in the local memory. 
>> Could you try to add some more printfs (with data and addresses of the 
>> array) to check the results? Maybe there is a data race somewhere in the 
>> code?
>
> As a reminder, each thread updates a unique index in the dist array and each 
> team updates a unique index in gpu_results.
>
> SPMD - shows each thread has a unique address for dist array
>
>   Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
>   Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
>   
>   Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
>   Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
>   
>   Team 0 Thread 0: After reduction into dist[0]: 1
>   Team 0 Thread 0: gpu_results address: 0x7f92a500
>   --
>   Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
>   Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
>   
>   Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
>   Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
>   
>   Team 1 Thread 0: After reduction into dist[0]: 1
>   Team 1 Thread 0: gpu_results address: 0x7f92a500
>   
>   gpu_results[0]: 1
>   gpu_results[1]: 1
>
> Generic - shows each team shares dist array address amongst threads
>
>   Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
>   Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
>   
>   Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
>   Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
>   
>   Team 0 Thread 0: After reduction into dist[0]: 2
>   Team 0 Thread 0: gpu_results address: 0x7fabc500
>   --
>   Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
>   Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
>   
>   Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
>   Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
>   
>   Team 1 Thread 0: After reduction into dist[0]: 2
>   Team 1 Thread 0: gpu_results address: 0x7fabc500

Could you check if it works with `-fno-openmp-cuda-parallel-target-regions` 
option?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2726391 , @ABataev wrote:

> In D99432#2726337 , @estewart08 
> wrote:
>
>> In D99432#2726060 , @ABataev wrote:
>>
>>> In D99432#2726050 , @estewart08 
>>> wrote:
>>>
 In D99432#2726025 , @ABataev 
 wrote:

> In D99432#2726019 , @estewart08 
> wrote:
>
>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not 
>> see how this helps SPMD mode with team privatization of declarations 
>> in-between target teams and parallel regions.
>
> Diв you try the reproducer with the applied patch?

 Yes, I still saw the test fail, although it was not with latest 
 llvm-project. Are you saying the reproducer passes for you?
>>>
>>> I don't have CUDA installed but from what I see in the LLVM IR it shall 
>>> pass. Do you have a debug log, does it crashes or produces incorrect 
>>> results?
>>
>> This is on an AMDGPU but I assume the behavior would be similar for NVPTX.
>>
>> It produces incorrect/incomplete results in the dist[0] index after a manual 
>> reduction and in turn the final global gpu_results array is incorrect.
>> When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] 
>> having been updated by thread 1. Which tells me the array is still thread 
>> private.
>> Adding some printfs, looking at one teams' output:
>>
>> SPMD
>>
>>   Thread 0: dist[0]: 1
>>   Thread 0: dist[1]: 0  // This should be 1
>>   After reduction into dist[0]: 1  // This should be 2
>>   gpu_results = [1,1]  // [2,2] expected
>>
>> Generic Mode:
>>
>>   Thread 0: dist[0]: 1
>>   Thread 0: dist[1]: 1   
>>   After reduction into dist[0]: 2
>>   gpu_results = [2,2]
>
> Hmm, I would expect a crash if the array was allocated in the local memory. 
> Could you try to add some more printfs (with data and addresses of the array) 
> to check the results? Maybe there is a data race somewhere in the code?

As a reminder, each thread updates a unique index in the dist array and each 
team updates a unique index in gpu_results.

SPMD - shows each thread has a unique address for dist array

  Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
  Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
  
  Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
  Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
  
  Team 0 Thread 0: After reduction into dist[0]: 1
  Team 0 Thread 0: gpu_results address: 0x7f92a500
  --
  Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
  Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
  
  Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
  Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
  
  Team 1 Thread 0: After reduction into dist[0]: 1
  Team 1 Thread 0: gpu_results address: 0x7f92a500
  
  gpu_results[0]: 1
  gpu_results[1]: 1

Generic - shows each team shares dist array address amongst threads

  Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
  Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
  
  Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
  Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
  
  Team 0 Thread 0: After reduction into dist[0]: 2
  Team 0 Thread 0: gpu_results address: 0x7fabc500
  --
  Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
  Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
  
  Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
  Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
  
  Team 1 Thread 0: After reduction into dist[0]: 2
  Team 1 Thread 0: gpu_results address: 0x7fabc500


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2726337 , @estewart08 wrote:

> In D99432#2726060 , @ABataev wrote:
>
>> In D99432#2726050 , @estewart08 
>> wrote:
>>
>>> In D99432#2726025 , @ABataev wrote:
>>>
 In D99432#2726019 , @estewart08 
 wrote:

> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see 
> how this helps SPMD mode with team privatization of declarations 
> in-between target teams and parallel regions.

 Diв you try the reproducer with the applied patch?
>>>
>>> Yes, I still saw the test fail, although it was not with latest 
>>> llvm-project. Are you saying the reproducer passes for you?
>>
>> I don't have CUDA installed but from what I see in the LLVM IR it shall 
>> pass. Do you have a debug log, does it crashes or produces incorrect results?
>
> This is on an AMDGPU but I assume the behavior would be similar for NVPTX.
>
> It produces incorrect/incomplete results in the dist[0] index after a manual 
> reduction and in turn the final global gpu_results array is incorrect.
> When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] 
> having been updated by thread 1. Which tells me the array is still thread 
> private.
> Adding some printfs, looking at one teams' output:
>
> SPMD
>
>   Thread 0: dist[0]: 1
>   Thread 0: dist[1]: 0  // This should be 1
>   After reduction into dist[0]: 1  // This should be 2
>   gpu_results = [1,1]  // [2,2] expected
>
> Generic Mode:
>
>   Thread 0: dist[0]: 1
>   Thread 0: dist[1]: 1   
>   After reduction into dist[0]: 2
>   gpu_results = [2,2]

Hmm, I would expect a crash if the array was allocated in the local memory. 
Could you try to add some more printfs (with data and addresses of the array) 
to check the results? Maybe there is a data race somewhere in the code?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2726060 , @ABataev wrote:

> In D99432#2726050 , @estewart08 
> wrote:
>
>> In D99432#2726025 , @ABataev wrote:
>>
>>> In D99432#2726019 , @estewart08 
>>> wrote:
>>>
 In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see 
 how this helps SPMD mode with team privatization of declarations 
 in-between target teams and parallel regions.
>>>
>>> Diв you try the reproducer with the applied patch?
>>
>> Yes, I still saw the test fail, although it was not with latest 
>> llvm-project. Are you saying the reproducer passes for you?
>
> I don't have CUDA installed but from what I see in the LLVM IR it shall pass. 
> Do you have a debug log, does it crashes or produces incorrect results?

This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

It produces incorrect/incomplete results in the dist[0] index after a manual 
reduction and in turn the final global gpu_results array is incorrect.
When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] 
having been updated by thread 1. Which tells me the array is still thread 
private.
Adding some printfs, looking at one teams' output:

SPMD

  Thread 0: dist[0]: 1
  Thread 0: dist[1]: 0  // This should be 1
  After reduction into dist[0]: 1  // This should be 2
  gpu_results = [1,1]  // [2,2] expected

Generic Mode:

  Thread 0: dist[0]: 1
  Thread 0: dist[1]: 1   
  After reduction into dist[0]: 2
  gpu_results = [2,2]


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2726050 , @estewart08 wrote:

> In D99432#2726025 , @ABataev wrote:
>
>> In D99432#2726019 , @estewart08 
>> wrote:
>>
>>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see 
>>> how this helps SPMD mode with team privatization of declarations in-between 
>>> target teams and parallel regions.
>>
>> Diв you try the reproducer with the applied patch?
>
> Yes, I still saw the test fail, although it was not with latest llvm-project. 
> Are you saying the reproducer passes for you?

I don't have CUDA installed but from what I see in the LLVM IR it shall pass. 
Do you have a debug log, does it crashes or produces incorrect results?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In D99432#2726025 , @ABataev wrote:

> In D99432#2726019 , @estewart08 
> wrote:
>
>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see 
>> how this helps SPMD mode with team privatization of declarations in-between 
>> target teams and parallel regions.
>
> Diв you try the reproducer with the applied patch?

Yes, I still saw the test fail, although it was not with latest llvm-project. 
Are you saying the reproducer passes for you?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2726019 , @estewart08 wrote:

> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how 
> this helps SPMD mode with team privatization of declarations in-between 
> target teams and parallel regions.

Diв you try the reproducer with the applied patch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-29 Thread Ethan Stewart via Phabricator via cfe-commits
estewart08 added a comment.

In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how 
this helps SPMD mode with team privatization of declarations in-between target 
teams and parallel regions.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-04-16 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev updated this revision to Diff 338107.
ABataev added a comment.

Rebase


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp

Index: clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
===
--- clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
+++ clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
@@ -28,16 +28,19 @@
 int main(int argc, char **argv) {
   int b[10], c[10], d[10];
 #pragma omp target teams map(tofrom:a)
+  {
+double escaped = 0;
 #pragma omp distribute parallel for firstprivate(b) lastprivate(c) if(a)
   for (int i= 0; i < argc; ++i)
-a = foo() + foo() + foo([i]) + foo([i]) + foo([i]);
+a = foo() + foo() + foo([i]) + foo([i]) + foo([i]) + escaped;
+  }
   return 0;
 }
 
 // SEQ: [[MEM_TY:%.+]] = type { [128 x i8] }
 // SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = weak addrspace(3) global [[MEM_TY]] undef
 // SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* undef
-// SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i{{64|32}} 40
+// SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i{{64|32}} 48
 // SEQ-DAG: [[KERNEL_SHARED:@.+]] = internal unnamed_addr constant i16 1
 // CHECK-DAG: @__omp_offloading_{{.*}}_main_[[LINE:l.+]]_exec_mode = weak constant i8 0
 
@@ -47,9 +50,10 @@
 // SEQ: call void @__kmpc_get_team_static_memory(i16 1, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* [[SHARED_GLOBAL_RD]], i32 0, i32 0, i32 0) to i8*), i{{64|32}} [[SIZE]], i16 [[SHARED]], i8** addrspacecast (i8* addrspace(3)* [[KERNEL_PTR]] to i8**))
 // SEQ: [[PTR:%.+]] = load i8*, i8* addrspace(3)* [[KERNEL_PTR]],
 // SEQ: [[GEP:%.+]] = getelementptr inbounds i8, i8* [[PTR]], i{{64|32}} 0
-// PAR: [[GEP:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32|64}} 40, i16 1)
+// PAR: [[GEP:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32|64}} 48, i16 1)
 // CHECK: [[STACK:%.+]] = bitcast i8* [[GEP]] to %struct._globalized_locals_ty*
-// CHECK: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]], i{{32|64}} 0, i{{32|64}} 0
+// CHECK-DAG: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]], i{{32|64}} 0, i{{32|64}} 1
+// CHECK-DAG: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]], i{{32|64}} 0, i{{32|64}} 0
 // CHECK-NOT: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]],
 // CHECK: call void @__kmpc_for_static_init_4(
 
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -229,6 +229,7 @@
   llvm::SmallDenseMap MappedDeclsFields;
   bool AllEscaped = false;
   bool IsForCombinedParallelRegion = false;
+  bool IsInSPMDKernel = false;
 
   void markAsEscaped(const ValueDecl *VD) {
 // Do not globalize declare target variables.
@@ -242,6 +243,9 @@
 // Variables captured by value must be globalized.
 if (auto *CSI = CGF.CapturedStmtInfo) {
   if (const FieldDecl *FD = CSI->lookup(cast(VD))) {
+// Do not globalize captured vars in SPMD mode.
+if (IsInSPMDKernel)
+  return;
 // Check if need to capture the variable that was already captured by
 // value in the outer region.
 if (!IsForCombinedParallelRegion) {
@@ -351,9 +355,10 @@
 
 public:
   CheckVarsEscapingDeclContext(CodeGenFunction ,
-   ArrayRef TeamsReductions)
-  : CGF(CGF), EscapedDecls(TeamsReductions.begin(), TeamsReductions.end()) {
-  }
+   ArrayRef TeamsReductions,
+   bool IsInSPMDKernel = false)
+  : CGF(CGF), EscapedDecls(TeamsReductions.begin(), TeamsReductions.end()),
+IsInSPMDKernel(IsInSPMDKernel) {}
   virtual ~CheckVarsEscapingDeclContext() = default;
   void VisitDeclStmt(const DeclStmt *S) {
 if (!S)
@@ -1631,65 +1636,30 @@
 OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy ) {
   SourceLocation Loc = D.getBeginLoc();
 
-  const RecordDecl *GlobalizedRD = nullptr;
-  llvm::SmallVector LastPrivatesReductions;
-  llvm::SmallDenseMap MappedDeclsFields;
-  unsigned WarpSize = CGM.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
-  // Globalize team reductions variable unconditionally in all modes.
-  if (getExecutionMode() != CGOpenMPRuntimeGPU::EM_SPMD)
-getTeamsReductionVars(CGM.getContext(), D, LastPrivatesReductions);
-  if (getExecutionMode() == CGOpenMPRuntimeGPU::EM_SPMD) {
-

[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-03-29 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2654025 , @jdoerfert wrote:

> In D99432#2653483 , @ABataev wrote:
>
>> In D99432#2653474 , @jdoerfert 
>> wrote:
>>
>>> Can we please always do the globalization, even in the `target teams 
>>> distribute parallel for` case you need it if a thread shares the address of 
>>> a local variable with the team and another thread uses it.
>>
>> Could you give a small example so I could better understand the problem?
>
> I didn't fine my old example, this should do though:
> https://godbolt.org/z/b7axxzxEf
>
> On the host or host offload I see:
> Mine: 0, Other: 42
>
> On a GPU I see:
> CUDA error: Error when synchronizing stream. stream = 0x4294db40, 
> async info ptr = 0x7fffdd939838
> CUDA error: an illegal memory access was encountered
>
>> Shall we globalize the variable in SPMD mode if we pass it by reference/take 
>> address in any case?
>
> Yes. I think that is strictly speaking necessary. We should commit it 
> together with the patches that "undo" globalization though.
>
>>> There is no argument other than "doesn't escape" that Clang can make to 
>>> disprove globalization is needed, IMHO.

It would be better to implement this in a separate patch. Let's fix the bug 
first and then implement the common functionality for locals globalization in 
SPMD mode (probably controlled by the compiler option/flag).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-03-26 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

In D99432#2653483 , @ABataev wrote:

> In D99432#2653474 , @jdoerfert wrote:
>
>> Can we please always do the globalization, even in the `target teams 
>> distribute parallel for` case you need it if a thread shares the address of 
>> a local variable with the team and another thread uses it.
>
> Could you give a small example so I could better understand the problem?

I didn't fine my old example, this should do though:
https://godbolt.org/z/En7To6xEW

On the host or host offload I see:
Mine: 0, Other: 42

On a GPU I see:
CUDA error: Error when synchronizing stream. stream = 0x4294db40, async 
info ptr = 0x7fffdd939838
CUDA error: an illegal memory access was encountered

> Shall we globalize the variable in SPMD mode if we pass it by reference/take 
> address in any case?

Yes. I think that is strictly speaking necessary. We should commit it together 
with the patches that "undo" globalization though.

>> There is no argument other than "doesn't escape" that Clang can make to 
>> disprove globalization is needed, IMHO.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-03-26 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev added a comment.

In D99432#2653474 , @jdoerfert wrote:

> Can we please always do the globalization, even in the `target teams 
> distribute parallel for` case you need it if a thread shares the address of a 
> local variable with the team and another thread uses it.

Could you give a small example so I could better understand the problem? Shall 
we globalize the variable in SPMD mode if we pass it by reference/take address 
in any case?

> There is no argument other than "doesn't escape" that Clang can make to 
> disprove globalization is needed, IMHO.




Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-03-26 Thread Johannes Doerfert via Phabricator via cfe-commits
jdoerfert added a comment.

Can we please always do the globalization, even in the `target teams distribute 
parallel for` case you need it if a thread shares the address of a local 
variable with the team and another thread uses it.
There is no argument other than "doesn't escape" that Clang can make to 
disprove globalization is needed, IMHO.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

2021-03-26 Thread Alexey Bataev via Phabricator via cfe-commits
ABataev created this revision.
ABataev added a reviewer: jdoerfert.
Herald added subscribers: guansong, yaxunl.
ABataev requested review of this revision.
Herald added a subscriber: sstefan1.
Herald added a project: clang.

Need to perform general analysis on SPMD kernels to correctly identify
the variables that should be globalized because of esacaping their
declaration context.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D99432

Files:
  clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
  clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp

Index: clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
===
--- clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
+++ clang/test/OpenMP/nvptx_distribute_parallel_generic_mode_codegen.cpp
@@ -28,16 +28,19 @@
 int main(int argc, char **argv) {
   int b[10], c[10], d[10];
 #pragma omp target teams map(tofrom:a)
+  {
+double escaped = 0;
 #pragma omp distribute parallel for firstprivate(b) lastprivate(c) if(a)
   for (int i= 0; i < argc; ++i)
-a = foo() + foo() + foo([i]) + foo([i]) + foo([i]);
+a = foo() + foo() + foo([i]) + foo([i]) + foo([i]) + escaped;
+  }
   return 0;
 }
 
 // SEQ: [[MEM_TY:%.+]] = type { [128 x i8] }
 // SEQ-DAG: [[SHARED_GLOBAL_RD:@.+]] = weak addrspace(3) global [[MEM_TY]] undef
 // SEQ-DAG: [[KERNEL_PTR:@.+]] = internal addrspace(3) global i8* undef
-// SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i{{64|32}} 40
+// SEQ-DAG: [[KERNEL_SIZE:@.+]] = internal unnamed_addr constant i{{64|32}} 48
 // SEQ-DAG: [[KERNEL_SHARED:@.+]] = internal unnamed_addr constant i16 1
 // CHECK-DAG: @__omp_offloading_{{.*}}_main_[[LINE:l.+]]_exec_mode = weak constant i8 0
 
@@ -47,9 +50,10 @@
 // SEQ: call void @__kmpc_get_team_static_memory(i16 1, i8* addrspacecast (i8 addrspace(3)* getelementptr inbounds ([[MEM_TY]], [[MEM_TY]] addrspace(3)* [[SHARED_GLOBAL_RD]], i32 0, i32 0, i32 0) to i8*), i{{64|32}} [[SIZE]], i16 [[SHARED]], i8** addrspacecast (i8* addrspace(3)* [[KERNEL_PTR]] to i8**))
 // SEQ: [[PTR:%.+]] = load i8*, i8* addrspace(3)* [[KERNEL_PTR]],
 // SEQ: [[GEP:%.+]] = getelementptr inbounds i8, i8* [[PTR]], i{{64|32}} 0
-// PAR: [[GEP:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32|64}} 40, i16 1)
+// PAR: [[GEP:%.+]] = call i8* @__kmpc_data_sharing_push_stack(i{{32|64}} 48, i16 1)
 // CHECK: [[STACK:%.+]] = bitcast i8* [[GEP]] to %struct._globalized_locals_ty*
-// CHECK: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]], i{{32|64}} 0, i{{32|64}} 0
+// CHECK-DAG: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]], i{{32|64}} 0, i{{32|64}} 1
+// CHECK-DAG: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]], i{{32|64}} 0, i{{32|64}} 0
 // CHECK-NOT: getelementptr inbounds %struct._globalized_locals_ty, %struct._globalized_locals_ty* [[STACK]],
 // CHECK: call void @__kmpc_for_static_init_4(
 
Index: clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
===
--- clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -229,6 +229,7 @@
   llvm::SmallDenseMap MappedDeclsFields;
   bool AllEscaped = false;
   bool IsForCombinedParallelRegion = false;
+  bool IsInSPMDKernel = false;
 
   void markAsEscaped(const ValueDecl *VD) {
 // Do not globalize declare target variables.
@@ -242,6 +243,9 @@
 // Variables captured by value must be globalized.
 if (auto *CSI = CGF.CapturedStmtInfo) {
   if (const FieldDecl *FD = CSI->lookup(cast(VD))) {
+// Do not globalize captured vars in SPMD mode.
+if (IsInSPMDKernel)
+  return;
 // Check if need to capture the variable that was already captured by
 // value in the outer region.
 if (!IsForCombinedParallelRegion) {
@@ -351,9 +355,10 @@
 
 public:
   CheckVarsEscapingDeclContext(CodeGenFunction ,
-   ArrayRef TeamsReductions)
-  : CGF(CGF), EscapedDecls(TeamsReductions.begin(), TeamsReductions.end()) {
-  }
+   ArrayRef TeamsReductions,
+   bool IsInSPMDKernel = false)
+  : CGF(CGF), EscapedDecls(TeamsReductions.begin(), TeamsReductions.end()),
+IsInSPMDKernel(IsInSPMDKernel) {}
   virtual ~CheckVarsEscapingDeclContext() = default;
   void VisitDeclStmt(const DeclStmt *S) {
 if (!S)
@@ -1631,65 +1636,30 @@
 OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy ) {
   SourceLocation Loc = D.getBeginLoc();
 
-  const RecordDecl *GlobalizedRD = nullptr;
-  llvm::SmallVector LastPrivatesReductions;
-  llvm::SmallDenseMap MappedDeclsFields;
-  unsigned WarpSize = CGM.getTarget().getGridValue(llvm::omp::GV_Warp_Size);
-  // Globalize team reductions variable unconditionally in all