[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789

--- Comment #23 from Hongtao.liu  ---
>  _813 = {_437, _448, _459, _470, _490, _501, _512, _523, _543, _554, _565,
> _576, _125, _143, _161, _179}; 

The cost of vec_construct in i386 backend is 64, calculated as 16 x 4

cut from i386.c
---
/* N element inserts into SSE vectors.  */ 
int cost = TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op;
---

>From perspective of pipeline latency, is seems ok, but from perspective of
rtx_cost, it seems inaccurate since it would be initialized as
---
vmovd   %eax, %xmm0
vpinsrb $1, 1(%rsi), %xmm0, %xmm0
vmovd   %eax, %xmm7
vpinsrb $1, 3(%rsi), %xmm7, %xmm7
vmovd   %eax, %xmm3
vpinsrb $1, 17(%rsi), %xmm3, %xmm3
vmovd   %eax, %xmm6
vpinsrb $1, 19(%rsi), %xmm6, %xmm6
vmovd   %eax, %xmm1
vpinsrb $1, 33(%rsi), %xmm1, %xmm1
vmovd   %eax, %xmm5
vpinsrb $1, 35(%rsi), %xmm5, %xmm5
vmovd   %eax, %xmm2
vpinsrb $1, 49(%rsi), %xmm2, %xmm2
vmovd   %eax, %xmm4
vpinsrb $1, 51(%rsi), %xmm4, %xmm4
vpunpcklwd  %xmm6, %xmm3, %xmm3
vpunpcklwd  %xmm4, %xmm2, %xmm2
vpunpcklwd  %xmm7, %xmm0, %xmm0
vpunpcklwd  %xmm5, %xmm1, %xmm1
vpunpckldq  %xmm2, %xmm1, %xmm1
vpunpckldq  %xmm3, %xmm0, %xmm0
vpunpcklqdq %xmm1, %xmm0, %xmm0
---

it's 16 "vector insert" + (4 + 2 + 1) "vector concat/permutation", so cost
should be 92(23 * 4).

[Bug target/96789] x264: sub4x4_dct() improves when vectorization is disabled

2020-09-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96789

--- Comment #22 from Hongtao.liu  ---
>One of my workmates found that if we disable vectorization for SPEC2017 
>>525.x264_r function sub4x4_dct in source file x264_src/common/dct.c with 
>?>explicit function attribute __attribute__((optimize("no-tree-vectorize"))), 
>it >can speed up by 4%.

For CLX, if we disable slp vectorization in sub4x4_dct by 
__attribute__((optimize("no-tree-slp-vectorize"))), it can also speed up by 4%.

> Thanks Richi! Should we take care of this case? or neglect this kind of
> extension as "no instruction"? I was intent to handle it in target specific
> code, but it isn't recorded into cost vector while it seems too heavy to do
> the bb_info slp_instances revisits in finish_cost.

For i386 backend unsigned char --> unsigned short is no "no instruction", but
in this case
---
1033  _134 = MEM[(pixel *)pix1_295 + 2B];   
1034  _135 = (short unsigned int) _134;
---

It could be combined and optimized to 
---
movzbl  19(%rcx), %r8d
---

So, if "unsigned char" variable is loaded from memory, then the convertion
would also be "no instruction", i'm not sure if backend cost model could handle
such situation.

[Bug c/97215] Possible fread() malfunction of GCC 7.3.0 (Windows)

2020-09-26 Thread sanmayce at sanmayce dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

--- Comment #5 from Georgi  ---
(In reply to Andrew Pinski from comment #3)
> fopen/fread/fwrite DOES NOT come from GCC, but rather than in this case
> mingw.

Ugh, thanks, will alert them about this issue by giving the link to this
tracker.

[Bug c/97215] Possible fread() malfunction of GCC 7.3.0 (Windows)

2020-09-26 Thread sanmayce at sanmayce dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

--- Comment #4 from Georgi  ---
(In reply to Andrew Pinski from comment #2)
> You need b if you don't want \r\n to be turned into just \n.

At 11,945th line I use:

```
if ((fp = fopen(argv[1], "rb")) == NULL) {
printf("Nakamichi: Can't open '%s' file.\n", argv[1]);
exit(13);
}
```

As far as I investigated, the problem is that fread() reads less (around 860
bytes) than specified, after decompression I see those bytes being ASCII 000?!

[Bug c/97215] Possible fread() malfunction of GCC 7.3.0 (Windows)

2020-09-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

--- Comment #3 from Andrew Pinski  ---
fopen/fread/fwrite DOES NOT come from GCC, but rather than in this case mingw.

[Bug c/97215] Possible fread() malfunction of GCC 7.3.0 (Windows)

2020-09-26 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
You need b if you don't want \r\n to be turned into just \n.

[Bug c/97215] Possible fread() malfunction of GCC 7.3.0 (Windows)

2020-09-26 Thread sanmayce at sanmayce dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

--- Comment #1 from Georgi  ---
Oops, here are the mentioned files:
www.sanmayce.com/Nakamichi/Satanichi_aka_Nakamichi_2020-Jun-09_BUG_ZEROED-END.zip

[Bug c/97215] New: Possible fread() malfunction of GCC 7.3.0 (Windows)

2020-09-26 Thread sanmayce at sanmayce dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97215

Bug ID: 97215
   Summary: Possible fread() malfunction of GCC 7.3.0 (Windows)
   Product: gcc
   Version: 7.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sanmayce at sanmayce dot com
  Target Milestone: ---

Hi, 
a C coder here.

Regarding a C source loading an entire 3.3GB file and checksumming it.

First, I use Intel v15.0 and GCC v7.3.0, on Windows 64bit.
For my dismay I encountered that Intel's binary loads and reports the correct
checksum, whereas GCC's binary fails, after comparing the loaded content I saw
that GCC loads all the file into a malloc-ed pool but without the last ~860
bytes?!

If you need to reproduce the issue - the two binaries (GCC and Intel) and the C
source as well are here:
http://www.sanmayce.com/Nakamichi/Nakamichi_Kaidanji.zip

The file being loaded is the Human Genome:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_01405.28_GRCh38.p13/GCA_01405.28_GRCh38.p13_genomic.fna.gz

This bug never appeared with files 1GB or less in size, my guess, this is a
clue.

These are the files:

```
06/11/2020  09:16 AM 1,316,439 Nakamichi_Ryuugan-ditto-1TB_btree.c
06/15/2019  02:37 AM 3,313,087,324
NCBI_FTP_Homo_sapiens_(human)_GCA_01405.28_GRCh38.p13_genomic.fna
06/15/2019  02:37 AM 3,313,087,324 q
01/07/2018  05:26 PM   191,644 Satanichi_GCC730_64bit.exe
06/11/2020  09:16 AM   198,144 Satanichi_ICL150_64bit.exe
```

As you can see below, the same file is loaded differently into malloc-ed pool:

```
D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_GCC730_64bit.exe q w 20 888 i
...
Allocating Source-Buffer 3,159 MB ...
Allocating Target-Buffer 3,191 MB ...
Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0xc1d4,3f7f
...
D:\Satanichi_aka_Nakamichi_2020-Jun-09>

D:\Satanichi_aka_Nakamichi_2020-Jun-09>Satanichi_ICL150_64bit.exe q w 20 888 i
Allocating Source-Buffer 3,159 MB ...
Allocating Target-Buffer 3,191 MB ...
Source-file-Hash(FNV1A_YoshimitsuTRIAD) = 0x81bd,fe4b
...
D:\Satanichi_aka_Nakamichi_2020-Jun-09>
```

If you need more info, will add it...
Very much I would like to know what causes this anomaly/bug.

Georgi

[Bug c/97208] [gcc 10.2.0] Microblaze regression

2020-09-26 Thread romain.naour at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97208

--- Comment #1 from Romain Naour  ---
Hello, 

I had to disable -ftree-loop-distribute-patterns while building the kernel on
microblaze (using -Os).

The regression appear since the commit [1] that moved
-ftree-loop-distribute-patterns from -O3 to -O2 (-Os) optimization level.

I guess this behavior change should be documented in the gcc 10 changes page
[2]?

[1]
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=5879ab5fafedc8f6f9bfe95a4cf8501b0df90edd

[2] https://gcc.gnu.org/gcc-10/changes.html

Best regards,
Romain

[Bug fortran/97210] Intrinsic function get_team() does not work

2020-09-26 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97210

kargl at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P4
 CC||kargl at gcc dot gnu.org
   Last reconfirmed||2020-09-26
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from kargl at gcc dot gnu.org ---
It seems the implementation of get_team() was wrong from its first appearance
in gfortran.

[Bug libstdc++/96817] __cxa_guard_acquire unsafe against dynamically loaded pthread

2020-09-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96817

--- Comment #17 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:e6923541fae5081b646f240d54de2a32e17a0382

commit r11-3484-ge6923541fae5081b646f240d54de2a32e17a0382
Author: Jonathan Wakely 
Date:   Sat Sep 26 20:32:36 2020 +0100

libstdc++: Use __libc_single_threaded to optimise atomics [PR 96817]

Glibc 2.32 adds a global variable that says whether the process is
single-threaded. We can use this to decide whether to elide atomic
operations, as a more precise and reliable indicator than
__gthread_active_p.

This means that guard variables for statics and reference counting in
shared_ptr can use less expensive, non-atomic ops even in processes that
are linked to libpthread, as long as no threads have been created yet.
It also means that we switch to using atomics if libpthread gets loaded
later via dlopen (this still isn't supported in general, for other
reasons).

We can't use __libc_single_threaded to replace __gthread_active_p
everywhere. If we replaced the uses of __gthread_active_p in std::mutex
then we would elide the pthread_mutex_lock in the code below, but not
the pthread_mutex_unlock:

  std::mutex m;
  m.lock();// pthread_mutex_lock
  std::thread t([]{}); // __libc_single_threaded = false
  t.join();
  m.unlock();  // pthread_mutex_unlock

We need the lock and unlock to use the same "is threading enabled"
predicate, and similarly for init/destroy pairs for mutexes and
condition variables, so that we don't try to release resources that were
never acquired.

There are other places that could use __libc_single_threaded, such as
_Sp_locker in src/c++11/shared_ptr.cc and locale init functions, but
they can be changed later.

libstdc++-v3/ChangeLog:

PR libstdc++/96817
* include/ext/atomicity.h (__gnu_cxx::__is_single_threaded()):
New function wrapping __libc_single_threaded if available.
(__exchange_and_add_dispatch, __atomic_add_dispatch): Use it.
* libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_abort)
(__cxa_guard_release): Likewise.
* testsuite/18_support/96817.cc: New test.

[Bug middle-end/94195] missing warning reading a smaller object via an lvalue of a larger type

2020-09-26 Thread dimhen at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94195

Dmitry G. Dyachenko  changed:

   What|Removed |Added

 CC||dimhen at gmail dot com

--- Comment #3 from Dmitry G. Dyachenko  ---
(In reply to CVS Commits from comment #2)
> The master branch has been updated by Martin Sebor :
> 
> https://gcc.gnu.org/g:3f9a497d1b0dd9da87908a11b59bf364ad40ddca
> 
> commit r11-3306-g3f9a497d1b0dd9da87908a11b59bf364ad40ddca
> Author: Martin Sebor 
> Date:   Sat Sep 19 17:47:29 2020 -0600
> 
> Extend -Warray-bounds to detect out-of-bounds accesses to array
> parameters.
> 
> gcc/ChangeLog:
> 
> PR middle-end/82608
> PR middle-end/94195
> PR c/50584
> PR middle-end/84051
> * gimple-array-bounds.cc (get_base_decl): New function.
> (get_ref_size): New function.
> (trailing_array): New function.
> (array_bounds_checker::check_array_ref): Call them.  Handle
> arrays
> declared in function parameters.
> (array_bounds_checker::check_mem_ref):  Same.  Handle references
> to
> dynamically allocated arrays.
> 
> gcc/testsuite/ChangeLog:
> 
> PR middle-end/82608
> PR middle-end/94195
> PR c/50584
> PR middle-end/84051
> * c-c++-common/Warray-bounds.c: Adjust.
> * gcc.dg/Wbuiltin-declaration-mismatch-9.c: Adjust.
> * gcc.dg/Warray-bounds-63.c: New test.
> * gcc.dg/Warray-bounds-64.c: New test.
> * gcc.dg/Warray-bounds-65.c: New test.
> * gcc.dg/Warray-bounds-66.c: New test.
> * gcc.dg/Warray-bounds-67.c: New test.

I am a bit confused -- now gcc produces warning.
But access is not out of allocated memory.
Is it expected?


$ cat x.c
#include 

struct S1 {
  unsigned x;
};
struct S {
  struct S1 s1;
  int z;
};

void f1()
{
  struct S *pS = (struct S*) calloc(sizeof(struct S1),1);
  if(pS->s1.x == 0)
return;
  free(pS);
}

$ gcc -O2 -Wall -c x.i
x.c: In function 'f1':
x.c:18:8: warning: array subscript 'struct S[0]' is partly outside array bounds
of 'unsigned char[4]' [-Warray-bounds]
   18 |   if(pS->s1.x == 0)
  |^~
x.c:17:30: note: referencing an object of size 4 allocated by 'calloc'
   17 |   struct S *pS = (struct S*) calloc(sizeof(struct S1),1);
  |  ^~~

[Bug target/97044] Undefined format macros because of include order on AIX

2020-09-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97044

--- Comment #5 from CVS Commits  ---
The master branch has been updated by David Edelsohn :

https://gcc.gnu.org/g:081b3517b4df826ac917147eb906bbb8fc6528b1

commit r11-3482-g081b3517b4df826ac917147eb906bbb8fc6528b1
Author: David Edelsohn 
Date:   Thu Sep 17 15:18:48 2020 +

aix: Fix _STDC_FORMAT_MACROS in inttypes.h [PR97044]

AIX protects the STDC Format Macros in a manner that can prevent the
definition of the macros depending on the order of header inclusion.

The protection of the macros was referenced in C99, removed in C11, and
never specified in any C++ standard. Also, the macros are in the namespace
reserved to the implementation (compiler) so the compiler is permitted to
choose to inject those names.

fixincludes/ChangeLog:

2020-09-17  David Edelsohn  

PR target/97044
* inclhack.def (aix_inttypes): New fix.
* fixincl.x: Regenerate.
* tests/base/sys/inttypes.h: New file.

[Bug c++/97214] New: ICE in lookup_template_class_1, at cp/pt.c:9896

2020-09-26 Thread sfranzen85 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97214

Bug ID: 97214
   Summary: ICE in lookup_template_class_1, at cp/pt.c:9896
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sfranzen85 at hotmail dot com
  Target Milestone: ---

The following snippet reproduces an error I encountered with lambdas:

struct Foo {
// void operator()(int) {}

template
void operator()(T, T)
{
auto bar = [this](auto&& v){ operator()(v); };
}
};

int main ()
{
Foo{}(0,1);
return 0;
}

Full error:
../src/main.cpp: In instantiation of ‘void Foo::operator()(T, T) [with T =
int]’:
../src/main.cpp:13:14:   required from here
../src/main.cpp:7:38: internal compiler error: in lookup_template_class_1, at
cp/pt.c:9896
7 | auto bar = [this](auto&& v){ operator()(v); };
  |  ^~

Further observations:
* The error appears regardless of available operator()() overloads;
* It only appears if the function call is unqualified, e.g. (*this)(v) is fine
if the overload exists.

A possibly related error is given using a version of Foo without a function
template:
struct Foo {
void operator()(int) {}

void operator()(int a, int)
{
auto bar = [this](auto&& v){ operator()(v); };
bar(a);
}
};

../src/main.cpp: In instantiation of ‘Foo::operator()(int,
int):: [with auto:1 = int&]’:
../src/main.cpp:8:14:   required from here
../src/main.cpp:7:48: error: use of ‘Foo::operator()(int,
int):: [with auto:1 = int&]’ before deduction of ‘auto’
7 | auto bar = [this](auto&& v){ operator()(v); };
  |  ~~^~~

This error similarly only appears with the unqualified call, and also
disappears if the lambda has '-> void'.

[Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread ttsiodras at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

Thanassis Tsiodras  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #4 from Thanassis Tsiodras  ---
Marking as resolved.

[Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #3 from Jakub Jelinek  ---
Note, I think significant speedup is in tail recursion optimization which will
be prevented even with mergeable task.  Computing fibonacci this way is not
efficient.

[Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread ttsiodras at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #2 from Thanassis Tsiodras  ---
I see. I was not aware of "mergeable", TBH - thanks for pointing it out (it led
me to reading about "data environments"). 

Thanks, Jakub.

[Bug libgomp/97213] OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

--- Comment #1 from Jakub Jelinek  ---
Even with if(false) the implementation has to create a new data environment
etc.
if(false) just means the task will be included, i.e. the generating task will
only continue when the included task finishes and the generating thread will
execute the task.
You'd need to add mergeable clause also to let the implementation for if(false)
pretend there wasn't the task directive at all, but that is just an
optimization option that GCC doesn't use right now (would require basically
copying the region once again).
Also, there is the overhead of the taskwait that you perform unconditionally at
all levels.

[Bug libgomp/97213] New: OpenMP "if" is dramatically slower than code-level "if" - why?

2020-09-26 Thread ttsiodras at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97213

Bug ID: 97213
   Summary: OpenMP "if" is dramatically slower than code-level
"if" - why?
   Product: gcc
   Version: 10.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ttsiodras at gmail dot com
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

In trying to understand how OpenMP `task` works, I did this benchmark:

#include 
#include 

long fib(int val)
{
if (val < 2)
return val;

long total = 0;
{
#pragma omp task shared(total) if(val==45)
total += fib(val-1);
#pragma omp task shared(total) if(val==45)
total += fib(val-2);
#pragma omp taskwait
}
return total;
}

int main()
{
#pragma omp parallel
#pragma omp single
{
long res = fib(45);
printf("fib(45)=%ld\n", res);
}
}

It's a simple Fibonacci calculation, that only spawns two tasks at the
top-level of fib(45) - basically, one thread does fib(44), the other does
fib(43); and the results are added and returned.

I know there's a chance for a race on the "+=" of the total - but that's not
the point of this... Here's the performance in my i5 laptop:

$ gcc -O2 with_openmp_if.c -fopenmp

$ time ./a.out 
fib(45)=1134903170

real1m4.244s
user1m44.696s
sys 0m0.010s

64 seconds... Now compare this, to the same code, but with the "if" moved from
OpenMP level, to user code level - i.e. this change in "fib":

long fib(int val)
{
if (val < 2)
return val;

long total = 0;
{
if (val == 45) {
#pragma omp task shared(total)
total += fib(val-1);
#pragma omp task shared(total)
total += fib(val-2);
#pragma omp taskwait
} else
return fib(val-1) + fib(val-2);
}
return total;
}

$ gcc -O2 with_normal_if.c -fopenmp

$ time ./a.out 
fib(45)=1134903170

real0m8.585s
user0m14.021s
sys 0m0.011s

We go from 64 seconds down to 8.5 seconds.

Why? 

What does the OpenMP-level "if" do so differently, that it causes an order of
magnitude less performance?

[Bug fortran/96495] [gfortran] Composition of user-defined operators does not copy ALLOCATABLE property of derived type

2020-09-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96495

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Paul Thomas :

https://gcc.gnu.org/g:5b26b3b3f5c75a86a5a3e851866247ac7fcb6c8b

commit r11-3480-g5b26b3b3f5c75a86a5a3e851866247ac7fcb6c8b
Author: Paul Thomas 
Date:   Sat Sep 26 12:32:35 2020 +0100

Correct overwrite of alloc_comp_result_2.f90 in fix of PR96495.

2020-26-09  Paul Thomas  

gcc/testsuite/
PR fortran/96495
* gfortran.dg/alloc_comp_result_2.f90 : Restore original.
* gfortran.dg/alloc_comp_result_3.f90 : New test.

[Bug libgomp/97212] New: [OpenMP] 'depend' clause with 'target nowait' (!) + 'task' does not work

2020-09-26 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97212

Bug ID: 97212
   Summary: [OpenMP] 'depend' clause with 'target nowait' (!) +
'task' does not work
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Keywords: openmp, wrong-code
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: burnus at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Created attachment 49274
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49274=edit
C testcase, run with -fopenmp

The SOLLVE_VV testcase
https://github.com/SOLLVE/sollve_vv/blob/master/tests/4.5/task/test_target_and_task_nowait.c

FAILS. Note: It also fails with a compiler which is not even configured for
offloading and, hence, everything is run on the host.


It uses with 'nowait' and 'depend':

#pragma omp target map(tofrom: a, sum) depend(out: a) nowait
  ... (set 'a') ...

#pragma omp task depend(in: a) shared(a,errors)
  ... check value of ...

A comment indicates a problem with real-world code:

// This test checks if dependence expressed on target and task 
// regions are honoured in the presense of nowait.
// This test is motivated by OpenMP usage in QMCPack.

[Bug bootstrap/97163] Build error with -mcpu=power9 on ppc64

2020-09-26 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97163

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:d00b1b023ecfc3ddc3fe952c0063dab7529d5f7a

commit r11-3476-gd00b1b023ecfc3ddc3fe952c0063dab7529d5f7a
Author: Jakub Jelinek 
Date:   Sat Sep 26 10:07:41 2020 +0200

powerpc, libcpp: Fix gcc build with clang on power8 [PR97163]

libcpp has two specialized altivec implementations of search_line_fast,
one for power8+ and the other one otherwise.
Both use __attribute__((altivec(vector))) and the GCC builtins rather than
altivec.h and the APIs from there, which is fine, but should be restricted
to when libcpp is built with GCC, so that it can be relied on.
The second elif is
and thus e.g. when built with clang it isn't picked, but the first one was
just guarded with
and so according to the bugreporter clang fails miserably on that.

The following patch fixes that by adding the same GCC_VERSION requirement
as the second version.  I don't know where the 4.5 in there comes from and
the exact version doesn't matter that much, as long as it is above 4.2 that
clang pretends to be and smaller or equal to 4.8 as the oldest gcc we
support as bootstrap compiler ATM.
Furthermore, the patch fixes the comment, the version it is talking about
is
not pre-GCC 5, but actually the GCC 5+ one.

2020-09-26  Jakub Jelinek  

PR bootstrap/97163
* lex.c (search_line_fast): Only use _ARCH_PWR8 Altivec version
for GCC >= 4.5.