[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 Rainer Orth changed: What|Removed |Added Resolution|--- |FIXED Status|REOPENED|RESOLVED --- Comment #22 from Rainer Orth --- (In reply to Jakub Jelinek from comment #21) > Yeah, but clearly not in libgomp, but in the testcase, so IMNSHO we > shouldn't reuse this PR for that. On x86_64-linux I see movaps into and out > from the thr.1 TLS variable and in two spots to/from stack, the thr.1 TLS > var has .align 16, so all seems to be ok. You're right: this failure was already introduced between 20210802 and 20210804. It's another instance of PR target/102772.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #21 from Jakub Jelinek --- Yeah, but clearly not in libgomp, but in the testcase, so IMNSHO we shouldn't reuse this PR for that. On x86_64-linux I see movaps into and out from the thr.1 TLS variable and in two spots to/from stack, the thr.1 TLS var has .align 16, so all seems to be ok.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 Rainer Orth changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #20 from Rainer Orth --- Almost: seems I overlooked one remaining failure which seems related in the preexisting ones (libgomp.c++/task-reduction-*.C random timeouts, PR libgomp/88707, and PR libgomp/102841): libgomp.fortran/pointer2.f90 still FAILs at -O2 and above: FAIL: libgomp.fortran/pointer2.f90 -O2 execution test FAIL: libgomp.fortran/pointer2.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: libgomp.fortran/pointer2.f90 -O3 -g execution test Program received signal SIGSEGV: Segmentation fault - invalid memory reference. Backtrace for this error: Segmentation Fault Thread 9 received signal SIGSEGV, Segmentation fault. [Switching to Thread 2 (LWP 2)] 0x080517ba in MAIN__::MAIN__._omp_fn.0 () at /vol/gcc/src/hg/master/local/libgomp/testsuite/libgomp.fortran/pointer2.f90:14 14 !$omp parallel copyin (thr) reduction(.or.:l) reduction(+:i) 1: x/i $pc => 0x80517ba : movaps %xmm7,-0x28(%ebx) (gdb) where #0 0x080517ba in MAIN__::MAIN__._omp_fn.0 () at /vol/gcc/src/hg/master/local/libgomp/testsuite/libgomp.fortran/pointer2.f90:14 #1 0xfe2d40dc in gomp_thread_start (xdata=) at /vol/gcc/src/hg/master/local/libgomp/team.c:129 #2 0xfdfd327b in _thrp_setup () from /lib/libc.so.1 #3 0xfdfd35b0 in ?? () from /lib/libc.so.1 #4 0x in ?? () (gdb) p/x $ebx $1 = 0xfe1202c0 (gdb) p/x $ebx-0x28 $2 = 0xfe120298 so this is another unaligned access, it seems.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #19 from Jakub Jelinek --- Hopefully fixed.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #18 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:17da2c7425ea1f5bf417b954f444dbe1f1618a1c commit r12-5362-g17da2c7425ea1f5bf417b954f444dbe1f1618a1c Author: Jakub Jelinek Date: Thu Nov 18 09:10:40 2021 +0100 libgomp: Ensure that either gomp_team is properly aligned [PR102838] struct gomp_team has struct gomp_work_share array inside of it. If that latter structure has 64-byte aligned member in the middle, the whole struct gomp_team needs to be 64-byte aligned, but we weren't allocating it using gomp_aligned_alloc. This patch fixes that, except that on gcn team_malloc is special, so I've instead decided at least for now to avoid using aligned member and use the padding instead on gcn. 2021-11-18 Jakub Jelinek PR libgomp/102838 * libgomp.h (GOMP_USE_ALIGNED_WORK_SHARES): Define if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined and __AMDGCN__ is not. (struct gomp_work_share): Use GOMP_USE_ALIGNED_WORK_SHARES instead of GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC. * work.c (alloc_work_share, gomp_work_share_start): Likewise. * team.c (gomp_new_team): If GOMP_USE_ALIGNED_WORK_SHARES, use gomp_aligned_alloc instead of team_malloc.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #17 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:7a2aa63fad06a72d9770b08491f1a7809eac7c50 commit r12-5361-g7a2aa63fad06a72d9770b08491f1a7809eac7c50 Author: Jakub Jelinek Date: Thu Nov 18 09:07:31 2021 +0100 libgomp: Fix up aligned_alloc arguments [PR102838] C says that aligned_alloc size must be an integral multiple of alignment. While glibc doesn't care about it, apparently Solaris does. So, this patch decreases the priority of aligned_alloc among the other variants because it needs more work and can waste more memory and rounds up the size to multiple of alignment. 2021-11-18 Jakub Jelinek PR libgomp/102838 * alloc.c (gomp_aligned_alloc): Prefer _aligned_alloc over memalign over posix_memalign over aligned_alloc over fallback with malloc instead of aligned_alloc over _aligned_alloc over posix_memalign over memalign over fallback with malloc. For aligned_alloc, round up size up to multiple of al.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #16 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #15 from Jakub Jelinek --- > Ah, bet Solaris aligned_alloc relies on: > "the value of size shall be an integral multiple of alignment" > (glibc aligned_alloc doesn't). Indeed, cf. aligned_alloc(3C): The alignment argument must be a valid alignment, that is, any power of two (1, 2, 4, 8, ...) and the size argument must be an integral multi- ple of alignment. > Does memalign or posix_memalign rely on that too, or just aligned_alloc? memalign(3C) has: The memalign() function allocates size bytes on a specified alignment boundary and returns a pointer to the allocated block. The value of the returned address is guaranteed to be an even multiple of alignment. The value of alignment must be a power of two and must be greater than or equal to the size of a word. and posix_memalign(3C): The posix_memalign() function allocates size bytes aligned on a bound- ary specified by alignment, and returns a pointer to the allocated mem- ory in memptr. The value of alignment must be a power of two multiple of sizeof(void *). > If just aligned_alloc, we could do incrementally: > 2021-11-17 Jakub Jelinek > > PR libgomp/102838 > * alloc.c (gomp_aligned_alloc): Prefer _aligned_alloc over > memalign over posix_memalign over aligned_alloc over fallback > with malloc instead of aligned_alloc over _aligned_alloc over > posix_memalign over memalign over fallback with malloc. For > aligned_alloc, round up size up to multiple of al. That patch worked just fine (only tried on i386-pc-solaris2.11, 32 and 64-bit): all 64-bit failures are gone again. Thanks.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #15 from Jakub Jelinek --- Ah, bet Solaris aligned_alloc relies on: "the value of size shall be an integral multiple of alignment" (glibc aligned_alloc doesn't). Does memalign or posix_memalign rely on that too, or just aligned_alloc? If just aligned_alloc, we could do incrementally: 2021-11-17 Jakub Jelinek PR libgomp/102838 * alloc.c (gomp_aligned_alloc): Prefer _aligned_alloc over memalign over posix_memalign over aligned_alloc over fallback with malloc instead of aligned_alloc over _aligned_alloc over posix_memalign over memalign over fallback with malloc. For aligned_alloc, round up size up to multiple of al. --- libgomp/alloc.c.jj 2021-01-04 10:25:56.157037659 +0100 +++ libgomp/alloc.c 2021-11-17 13:32:25.246271672 +0100 @@ -65,18 +65,24 @@ gomp_aligned_alloc (size_t al, size_t si void *ret; if (al < sizeof (void *)) al = sizeof (void *); -#ifdef HAVE_ALIGNED_ALLOC - ret = aligned_alloc (al, size); -#elif defined(HAVE__ALIGNED_MALLOC) +#ifdef HAVE__ALIGNED_MALLOC ret = _aligned_malloc (size, al); -#elif defined(HAVE_POSIX_MEMALIGN) - if (posix_memalign (&ret, al, size) != 0) -ret = NULL; #elif defined(HAVE_MEMALIGN) { extern void *memalign (size_t, size_t); ret = memalign (al, size); } +#elif defined(HAVE_POSIX_MEMALIGN) + if (posix_memalign (&ret, al, size) != 0) +ret = NULL; +#lif defined(HAVE_ALIGNED_ALLOC) + { +size_t sz = (size + al - 1) & ~(al - 1); +if (__builtin_expect (sz >= size, 1)) + ret = aligned_alloc (al, sz); +else + ret = NULL; + } #else ret = NULL; if ((al & (al - 1)) == 0 && size)
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #14 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #13 from Jakub Jelinek --- > Created attachment 51807 > --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51807&action=edit > gcc12-pr102838-2.patch > > Does this patch fix it? While the patch fixes all 32-bit x86 failures, it introduces ca. 230 execution failures for 64-bit (both x86 and sparc), all like libgomp: Out of memory allocating 2656 bytes e.g. for libgomp.c/single-1.c with this stack trace: Thread 2 hit Breakpoint 1, gomp_fatal (fmt=fmt@entry=0x7fff57b183d8 "Out of memory allocating %lu bytes") at /vol/gcc/src/hg/master/local/libgomp/error.c:85 85 { (gdb) where #0 gomp_fatal (fmt=fmt@entry=0x7fff57b183d8 "Out of memory allocating %lu bytes") at /vol/gcc/src/hg/master/local/libgomp/error.c:85 #1 0x7fff57b1b278 in gomp_aligned_alloc (al=al@entry=64, size=size@entry=2656) at /vol/gcc/src/hg/master/local/libgomp/alloc.c:94 #2 0x7fff57b2e1eb in gomp_new_team (nthreads=nthreads@entry=3) at /vol/gcc/src/hg/master/local/libgomp/team.c:181 #3 0x7fff57b25627 in GOMP_parallel_start (fn=0x401410 , data=0x0, num_threads=3) at /vol/gcc/src/hg/master/local/libgomp/parallel.c:133 #4 0x004014e6 in main () Running the test under truss shows this immediately before the error message: brk(0x) = 0x00501980 brk(0x00501980) = 0x brk(0x00505980) = 0x
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #13 from Jakub Jelinek --- Created attachment 51807 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51807&action=edit gcc12-pr102838-2.patch Does this patch fix it? Sorry for the delay...
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 Thomas Schwinge changed: What|Removed |Added CC||jules at gcc dot gnu.org --- Comment #12 from Thomas Schwinge --- *** Bug 102856 has been marked as a duplicate of this bug. ***
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #11 from Jakub Jelinek --- Ah, I see, there is still the team->work_shares array where the whole team structure in which it is present is allocated with gomp_malloc. So, either we need to drop the aligned (64) attribute regardless of GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC, or make sure that if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC we use gomp_aligned_alloc instead of gomp_malloc for team_malloc, or combination of both (GCN has its own separate team_malloc, so perhaps get rid of aligned attribute on GCN too).
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #10 from Rainer Orth --- Created attachment 51644 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51644&action=edit 32-bit i386-pc-solaris2.11 loop_ull.[is]
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #9 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #8 from Jakub Jelinek --- > Ok, so, first question, is GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC defined in your > case? It is since all of HAVE_ALIGNED_ALLOC, HAVE_POSIX_MEMALIGN, and HAVE_MEMALIGN are defined in config.h. > Can be seen e.g. from objdump -dr alloc.o and seeing if gomp_aligned_free is > just a (tail?) call to free without anything else or if it performs > if (ptr) > free (((void **) ptr)[-1]); It's the former: just a call to free. > Or perhaps attach preprocessed loop-ull.i and loop-ull.s. Doing so just in case...
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #8 from Jakub Jelinek --- Ok, so, first question, is GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC defined in your case? Can be seen e.g. from objdump -dr alloc.o and seeing if gomp_aligned_free is just a (tail?) call to free without anything else or if it performs if (ptr) free (((void **) ptr)[-1]); Or perhaps attach preprocessed loop-ull.i and loop-ull.s.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #7 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #6 from ro at CeBiTec dot Uni-Bielefeld.DE Uni-Bielefeld.DE> --- >> --- Comment #5 from Jakub Jelinek --- >> Does the committed patch fix the issue on Solaris? > > I'll see after tonight's bootstrap. The original one attached to the PR > fixed only a few of the failures: > > -FAIL: libgomp.c/../libgomp.c-c++-common/for-1.c execution test > > -FAIL: libgomp.c/../libgomp.c-c++-common/for-2.c execution test > > -FAIL: libgomp.c/../libgomp.c-c++-common/for-7.c execution test > -FAIL: libgomp.c/../libgomp.c-c++-common/for-8.c execution test The committed version is the same: quite a number of the previous failures remain. E.g. even with OMP_NUM_THREADS=8, for-3.c starts a large number of threads (200+), then hangs under gdb, but if I let it run directly, it SEGVs again and the core dump shows the same unaligned access (with dbx, gdb cannot interpret the core dump): t@198 (l@198) terminated by signal SEGV (no mapping at the fault address) 0xfe5d7fb8: gomp_loop_ull_guided_start+0x01a8: movaps %xmm0,0x0010(%esi) (dbx) print $esi $esi = 134981864
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #6 from ro at CeBiTec dot Uni-Bielefeld.DE --- > --- Comment #5 from Jakub Jelinek --- > Does the committed patch fix the issue on Solaris? I'll see after tonight's bootstrap. The original one attached to the PR fixed only a few of the failures: -FAIL: libgomp.c/../libgomp.c-c++-common/for-1.c execution test -FAIL: libgomp.c/../libgomp.c-c++-common/for-2.c execution test -FAIL: libgomp.c/../libgomp.c-c++-common/for-7.c execution test -FAIL: libgomp.c/../libgomp.c-c++-common/for-8.c execution test
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #5 from Jakub Jelinek --- Does the committed patch fix the issue on Solaris?
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #4 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:c7abdf46fb7ac9a0c37f120feff3fcc3a752584f commit r12-4529-gc7abdf46fb7ac9a0c37f120feff3fcc3a752584f Author: Jakub Jelinek Date: Wed Oct 20 09:34:51 2021 +0200 openmp: Fix up struct gomp_work_share handling [PR102838] If GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is not defined, the intent was to treat the split of the structure between first cacheline (64 bytes) as mostly write-once, use afterwards and second cacheline as rw just as an optimization. But as has been reported, with vectorization enabled at -O2 it can now result in aligned vector 16-byte or larger stores. When not having posix_memalign/aligned_alloc/memalign or other similar API, alloc.c emulates it but it needs to allocate extra memory for the dynamic realignment. So, for the GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC not defined case, this patch stops using aligned (64) attribute in the middle of the structure and instead inserts padding that puts the second half of the structure at offset 64 bytes. And when GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, usually it was allocated as aligned, but for the orphaned case it could still be allocated just with gomp_malloc without guaranteed proper alignment. 2021-10-20 Jakub Jelinek PR libgomp/102838 * libgomp.h (struct gomp_work_share_1st_cacheline): New type. (struct gomp_work_share): Only use aligned(64) attribute if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined, otherwise just add padding before lock to ensure lock is at offset 64 bytes into the structure. (gomp_workshare_struct_check1, gomp_workshare_struct_check2): New poor man's static assertions. * work.c (gomp_work_share_start): Use gomp_aligned_alloc instead of gomp_malloc if GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Last reconfirmed||2021-10-19 Ever confirmed|0 |1 Status|UNCONFIRMED |ASSIGNED --- Comment #3 from Jakub Jelinek --- Created attachment 51633 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51633&action=edit gcc12-pr102838.patch Very lightly tested patch. This does two things, for orphaned construct when GOMP_HAVE_EFFICIENT_ALIGNED_ALLOC is defined uses gomp_aligned_alloc rather than gomp_malloc even for the single workshare allocation, because when we have the aligned attribute in there, the compiler is right to assume that alignment. And if the macro isn't defined (i.e. OS doesn't have working posix_memalign, memalign, aligned_alloc etc.) instead of having aligned attribute in the structure it just makes sure that the second half starts on a 64-byte boundary.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #2 from Jakub Jelinek --- Actually no, I'm wrong, there is gomp_mutex_t lock __attribute__((aligned (64))); in the middle of the structure, so the start of the structure is 64-byte aligned.
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 --- Comment #1 from Jakub Jelinek --- Possibly triggered by enabling vectorization at -O2? long long should for ia32 inside of structures (at least on linux but I think it is the same for solaris too) have just 32-bit alignment and nothing else in struct gomp_work_share should result in 64-bit alignment of the structure, so if the vectorizer (slp?) decides to use aligned 128-bit store, something is wrong...
[Bug libgomp/102838] [12 regression] Several tests SEGV in gomp_loop_ull_init
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102838 Rainer Orth changed: What|Removed |Added Target Milestone|--- |12.0