Re: [patch, libfortran] Fix thead sanitizer issue with libgfortran
Hi Bernd, I believe that all omp threads are created in detached state, so pthread_join should be undefined on them, just tsan*thinks* otherwise? When I look further on the libgomp sources, I see there are two completely different implementations of the mutexes, barriers, etc. One using posix functions which should be visible to tsan and another one using raw linux futex calls which should be invisible to tsan, by default the linux system calls are used, which explains why tsan seems to be unaware of the actual synchronization in this example. Have you tried to build with --enable-linux-futex=no ? I just did that, and it appears that the thread sanitizer no longer detects any issues even without my patch (at least on powerpc64-unknown-linux-gnu, gcc110), which showed the behavior before. So, what is the correct way forward? Modifying apparently correct code in libgfortran isn't the answer, I think. Having false positives when users specify -fopenmp -fsanitize=thread is also not nice - this makes debugging multithreaded applications hard. Would it be possible to build a posix-thread-only version of libgomp to be linked in when -fsanitize=thread is supplied? Jakub, any suggestions? Regards Thomas
Re: [patch, libfortran] Fix thead sanitizer issue with libgfortran
On 10/01/17 15:41, Thomas Koenig wrote: > Am 01.10.2017 um 10:59 schrieb Bernd Edlinger: >> maybe there is a way how you could explicitly join >> all running threads? > > Yes, that seems to do the trick. Thanks! > Oh, that is really very surprising... I believe that all omp threads are created in detached state, so pthread_join should be undefined on them, just tsan *thinks* otherwise? When I look further on the libgomp sources, I see there are two completely different implementations of the mutexes, barriers, etc. One using posix functions which should be visible to tsan and another one using raw linux futex calls which should be invisible to tsan, by default the linux system calls are used, which explains why tsan seems to be unaware of the actual synchronization in this example. Have you tried to build with --enable-linux-futex=no ? I just saw the following line in libgomp/configure.tgt: # Since we require POSIX threads, assume a POSIX system by default. config_path="posix" # Check for futex enabled all at once. if test x$enable_linux_futex = xyes; then So maybe completely different things will be reported if this value is set to no? > Here is a patch which appears to work. It does hit a snag with static > linking, though, because it calls __gthread_self (), and that causes > a segfault with -static :-(. > > The test case in question is static_linking_1.f. > > This appears to be a general problem, and has been discussed > before, for example in > https://gcc.gnu.org/ml/gcc-help/2010-05/msg00029.html . > > What would be the best way to proceed? Modify the behavior of -static > with gfortran? > > Regards > > Thomas > > 2017-10-01 Thomas Koenig> > > PR fortran/66756 > PR fortran/82378 > * io/io.h: Add field th to gfc_unit. Add prototypes for > lock_unit and trylock_unit. > * io/unit.c (insert_unit): Do not create lock and lock, move to > (gfc_get_unit): here; lock after insert_unit has succeded. > Use lock_unit and trylock_unit instead of __gthread_mutex_lock > and __gthread_mutex_trylock. > (init_units): Do not unlock unit locks for stdin, stdout and > stderr. > (lock_unit): New function. > (trylock_unit): New function. > (close_units): If a unit still has a lock, wait for the > completion of the corresponding thread. > * io/unix.c (find_file): Use lock_unit and trylock_unit instead > of __gthread_mutex_lock and __gthread_mutex_trylock. > (flush_all_units): Likewise. > > 2017-10-01 Thomas Koenig > > PR fortran/66756 > PR fortran/82378 > * gfortran.dg/openmp-close.f90: New test.
Re: [patch, libfortran] Fix thead sanitizer issue with libgfortran
Am 01.10.2017 um 10:59 schrieb Bernd Edlinger: maybe there is a way how you could explicitly join all running threads? Yes, that seems to do the trick. Thanks! Here is a patch which appears to work. It does hit a snag with static linking, though, because it calls __gthread_self (), and that causes a segfault with -static :-(. The test case in question is static_linking_1.f. This appears to be a general problem, and has been discussed before, for example in https://gcc.gnu.org/ml/gcc-help/2010-05/msg00029.html . What would be the best way to proceed? Modify the behavior of -static with gfortran? Regards Thomas 2017-10-01 Thomas KoenigPR fortran/66756 PR fortran/82378 * io/io.h: Add field th to gfc_unit. Add prototypes for lock_unit and trylock_unit. * io/unit.c (insert_unit): Do not create lock and lock, move to (gfc_get_unit): here; lock after insert_unit has succeded. Use lock_unit and trylock_unit instead of __gthread_mutex_lock and __gthread_mutex_trylock. (init_units): Do not unlock unit locks for stdin, stdout and stderr. (lock_unit): New function. (trylock_unit): New function. (close_units): If a unit still has a lock, wait for the completion of the corresponding thread. * io/unix.c (find_file): Use lock_unit and trylock_unit instead of __gthread_mutex_lock and __gthread_mutex_trylock. (flush_all_units): Likewise. 2017-10-01 Thomas Koenig PR fortran/66756 PR fortran/82378 * gfortran.dg/openmp-close.f90: New test. Index: io/io.h === --- io/io.h (Revision 253162) +++ io/io.h (Arbeitskopie) @@ -661,6 +661,8 @@ typedef struct gfc_unit int continued; __gthread_mutex_t lock; + /* ID of the thread currently holding the lock. */ + __gthread_t th; /* Number of threads waiting to acquire this unit's lock. When non-zero, close_unit doesn't only removes the unit from the UNIT_ROOT tree, but doesn't free it and the @@ -764,6 +766,12 @@ internal_proto(get_unit); extern void unlock_unit (gfc_unit *); internal_proto(unlock_unit); +extern void lock_unit (gfc_unit *); +internal_proto(lock_unit); + +extern int trylock_unit (gfc_unit *); +internal_proto (trylock_unit); + extern void finish_last_advance_record (gfc_unit *u); internal_proto (finish_last_advance_record); Index: io/unit.c === --- io/unit.c (Revision 253162) +++ io/unit.c (Arbeitskopie) @@ -221,9 +221,9 @@ insert (gfc_unit *new, gfc_unit *t) return t; } +/* insert_unit()-- Create a new node, insert it into the treap. It is assumed + that the caller holds unit_lock. */ -/* insert_unit()-- Create a new node, insert it into the treap. */ - static gfc_unit * insert_unit (int n) { @@ -237,7 +237,6 @@ insert_unit (int n) #else __GTHREAD_MUTEX_INIT_FUNCTION (>lock); #endif - __gthread_mutex_lock (>lock); u->priority = pseudo_random (); unit_root = insert (u, unit_root); return u; @@ -361,9 +360,12 @@ retry: if (created) { - /* Newly created units have their lock held already - from insert_unit. Just unlock UNIT_LOCK and return. */ __gthread_mutex_unlock (_lock); + + /* Nobody outside this address has seen this unit yet. We could safely + keep it unlocked until now. */ + + lock_unit (p); return p; } @@ -371,7 +373,7 @@ found: if (p != NULL && (p->child_dtio == 0)) { /* Fast path. */ - if (! __gthread_mutex_trylock (>lock)) + if (! trylock_unit (p)) { /* assert (p->closed == 0); */ __gthread_mutex_unlock (_lock); @@ -386,7 +388,7 @@ found: if (p != NULL && (p->child_dtio == 0)) { - __gthread_mutex_lock (>lock); + lock_unit (p); if (p->closed) { __gthread_mutex_lock (_lock); @@ -616,10 +618,9 @@ init_units (void) u->endfile = NO_ENDFILE; u->filename = strdup (stdin_name); + u->th = __gthread_self (); fbuf_init (u, 0); - - __gthread_mutex_unlock (>lock); } if (options.stdout_unit >= 0) @@ -647,10 +648,9 @@ init_units (void) u->endfile = AT_ENDFILE; u->filename = strdup (stdout_name); + u->th = __gthread_self (); fbuf_init (u, 0); - - __gthread_mutex_unlock (>lock); } if (options.stderr_unit >= 0) @@ -677,11 +677,10 @@ init_units (void) u->endfile = AT_ENDFILE; u->filename = strdup (stderr_name); + u->th = __gthread_self (); fbuf_init (u, 256); /* 256 bytes should be enough, probably not doing any kind of exotic formatting to stderr. */ - - __gthread_mutex_unlock (>lock); } /* Calculate the maximum file offset in a portable
Re: [patch, libfortran] Fix thead sanitizer issue with libgfortran
Hi, I think this might be a false positive tsan warning. Maybe tsan does not see why a desctructor function can not execute before the last detached thread terminates. I created a small test case that receives a warning which is similar to the one you tried to fix with your patch: cat test.c #include #include int test; static void * t1(void* p) { printf("t1\n"); test = 1; return NULL; } static void __attribute__((destructor)) t2 (void) { test = 2; printf("t2\n"); } int main() { pthread_t t; pthread_attr_t a; pthread_attr_init(); pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED); pthread_create(, , t1, NULL); pthread_attr_destroy(); return 0; } gcc -Wall -fsanitize=thread test.c ./a.out t1 t2 == WARNING: ThreadSanitizer: data race (pid=3564) Write of size 4 at 0x0060107c by thread T1: #0 t1 (a.out+0x0040088e) Previous write of size 4 at 0x0060107c by main thread: #0 t2 (a.out+0x004008c6) #1 (ld-linux-x86-64.so.2+0x000108d9) Location is global 'test' of size 4 at 0x0060107c (a.out+0x0060107c) Thread T1 (tid=3566, running) created by main thread at: #0 pthread_create ../../../../gcc-trunk/libsanitizer/tsan/tsan_interceptors.cc:900 (libtsan.so.0+0x0002914e) #1 main (a.out+0x0040092e) SUMMARY: ThreadSanitizer: data race (/home/ed/gnu/gcc-test/a.out+0x40088e) in t1 == ThreadSanitizer: reported 1 warnings The warning goes away when the main function explicitly joins the thread t1. That is what I do with all my threads whenever possible, maybe there is a way how you could explicitly join all running threads? Bernd.
Re: [patch, libfortran] Fix thead sanitizer issue with libgfortran
On Fri, Sep 29, 2017 at 9:53 PM, Thomas Koenigwrote: > Am 29.09.2017 um 10:03 schrieb Janne Blomqvist: > >> >> 1) I'm confused why fbuf_destroy is modified. The fbuf structure is >> part of gfc_unit, and should be accessed with the same locking rules >> as the rest of the gfc_unit components. When closing the unit, I think >> the same should apply here, no? > > > It is to avoid a data race for > > program main > use omp_lib > !$OMP PARALLEL NUM_THREADS(2) > call file_open(OMP_get_thread_num()) > !$OMP END PARALLEL > contains > recursive subroutine file_open(i) > integer :: i > integer :: nunit > nunit = i + 20 > write (nunit,*) 'asdf' > end subroutine file_open > end program main > > which leads to > > Read of size 4 at 0x7b58ff30 by main thread (mutexes: write M16): > #0 close_unit_1 ../../../trunk/libgfortran/io/unit.c:699 > (libgfortran.so.5+0x00283ba6) > #1 _gfortrani_close_units ../../../trunk/libgfortran/io/unit.c:767 > (libgfortran.so.5+0x00283f59) > #2 cleanup ../../../trunk/libgfortran/runtime/main.c:113 > (libgfortran.so.5+0x0001fe22) > #3 _dl_fini (ld-linux-x86-64.so.2+0xfb62) > > Previous write of size 4 at 0x7b58ff30 by thread T1 (mutexes: write > M17): > #0 finalize_transfer ../../../trunk/libgfortran/io/transfer.c:3934 > (libgfortran.so.5+0x00281aa1) > #1 _gfortran_st_write_done ../../../trunk/libgfortran/io/transfer.c:4125 > (libgfortran.so.5+0x00281e35) > #2 file_open.3528 (a.out+0x00400c1b) > #3 MAIN__._omp_fn.0 (a.out+0x00400d27) > #4 gomp_thread_start ../../../trunk/libgomp/team.c:120 > (libgomp.so.1+0x0001756f) > > Note that not all problems with implicit close at the end are resolved > so far, but that is a different problem. I'm still confused. If gfc_unit->fbuf needs to be protected by holding gfc_unit->lock when closing, surely the same applies to other gfc_units fields as well? How if gfc_unit->fbuf special? AFAICS it isn't.. -- Janne Blomqvist
Re: [patch, libfortran] Fix thead sanitizer issue with libgfortran
Am 29.09.2017 um 10:03 schrieb Janne Blomqvist: 1) I'm confused why fbuf_destroy is modified. The fbuf structure is part of gfc_unit, and should be accessed with the same locking rules as the rest of the gfc_unit components. When closing the unit, I think the same should apply here, no? It is to avoid a data race for program main use omp_lib !$OMP PARALLEL NUM_THREADS(2) call file_open(OMP_get_thread_num()) !$OMP END PARALLEL contains recursive subroutine file_open(i) integer :: i integer :: nunit nunit = i + 20 write (nunit,*) 'asdf' end subroutine file_open end program main which leads to Read of size 4 at 0x7b58ff30 by main thread (mutexes: write M16): #0 close_unit_1 ../../../trunk/libgfortran/io/unit.c:699 (libgfortran.so.5+0x00283ba6) #1 _gfortrani_close_units ../../../trunk/libgfortran/io/unit.c:767 (libgfortran.so.5+0x00283f59) #2 cleanup ../../../trunk/libgfortran/runtime/main.c:113 (libgfortran.so.5+0x0001fe22) #3 _dl_fini (ld-linux-x86-64.so.2+0xfb62) Previous write of size 4 at 0x7b58ff30 by thread T1 (mutexes: write M17): #0 finalize_transfer ../../../trunk/libgfortran/io/transfer.c:3934 (libgfortran.so.5+0x00281aa1) #1 _gfortran_st_write_done ../../../trunk/libgfortran/io/transfer.c:4125 (libgfortran.so.5+0x00281e35) #2 file_open.3528 (a.out+0x00400c1b) #3 MAIN__._omp_fn.0 (a.out+0x00400d27) #4 gomp_thread_start ../../../trunk/libgomp/team.c:120 (libgomp.so.1+0x0001756f) Note that not all problems with implicit close at the end are resolved so far, but that is a different problem. 2) I think the mutex init stuff can remain in insert_unit, just the locking needs to move out and below freeing unit_lock like you have done. I can change that one easily. Any other comments? Regards Thomas
Re: [patch, libfortran] Fix thead sanitizer issue with libgfortran
On Fri, Sep 29, 2017 at 9:03 AM, Thomas Koenigwrote: > I wrote: > >> This gets rid of the thread sanitizer issue; the thread sanitizer >> output is clean. However, I would appreciate feedback about whether >> this approach (and my code) is correct. >> >> Regression-tested. >> > > > Here is an update: The previous version of the patch had a regression, > which is removed (with a test case) with the current version. > Also regression-tested. > > So, > >> Comments? Suggestions for improvements/other approaches? Close the PR >> as WONTFIX instead? OK for trunk? In general, I do think this is an issue that should be fixed. Multithreading is only going get more pervasive, and gfortran spewing false positives from tools such as threadsanitizer is definitely unhelpful. Now, for the patch itself, AFAICT the general approach is correct. However, I have a couple of questions and/or suggestions for improvements: 1) I'm confused why fbuf_destroy is modified. The fbuf structure is part of gfc_unit, and should be accessed with the same locking rules as the rest of the gfc_unit components. When closing the unit, I think the same should apply here, no? 2) I think the mutex init stuff can remain in insert_unit, just the locking needs to move out and below freeing unit_lock like you have done. -- Janne Blomqvist
Re: [patch, libfortran] Fix thead sanitizer issue with libgfortran
I wrote: This gets rid of the thread sanitizer issue; the thread sanitizer output is clean. However, I would appreciate feedback about whether this approach (and my code) is correct. Regression-tested. Here is an update: The previous version of the patch had a regression, which is removed (with a test case) with the current version. Also regression-tested. So, > Comments? Suggestions for improvements/other approaches? Close the PR > as WONTFIX instead? OK for trunk? Regards Thomas 2017-09-29 Thomas KoenigPR fortran/66756 * io/fbuf.c (fbuf_destroy): Add argument "locked". Lock unit before freeing the buffer if necessary. * io/unit.c (insert_unit): Do not create lock and lock, move to (gfc_get_unit): here; lock after insert_unit has succeded. (init_units): Do not unlock unit locks for stdin, stdout and stderr. 2017-09-29 Thomas Koenig PR fortran/66756 * gfortran.dg/openmp-close.f90: New test. Index: io/fbuf.c === --- io/fbuf.c (Revision 253162) +++ io/fbuf.c (Arbeitskopie) @@ -46,13 +46,17 @@ fbuf_init (gfc_unit *u, int len) void -fbuf_destroy (gfc_unit *u) +fbuf_destroy (gfc_unit *u, int locked) { if (u->fbuf == NULL) return; + if (locked) +__gthread_mutex_lock (>lock); free (u->fbuf->buf); free (u->fbuf); u->fbuf = NULL; + if (locked) +__gthread_mutex_unlock (>lock); } Index: io/fbuf.h === --- io/fbuf.h (Revision 253162) +++ io/fbuf.h (Arbeitskopie) @@ -47,7 +47,7 @@ struct fbuf extern void fbuf_init (gfc_unit *, int); internal_proto(fbuf_init); -extern void fbuf_destroy (gfc_unit *); +extern void fbuf_destroy (gfc_unit *, int); internal_proto(fbuf_destroy); extern int fbuf_reset (gfc_unit *); Index: io/unit.c === --- io/unit.c (Revision 253162) +++ io/unit.c (Arbeitskopie) @@ -221,23 +221,14 @@ insert (gfc_unit *new, gfc_unit *t) return t; } +/* insert_unit()-- Create a new node, insert it into the treap. It is assumed + that the caller holds unit_lock. */ -/* insert_unit()-- Create a new node, insert it into the treap. */ - static gfc_unit * insert_unit (int n) { gfc_unit *u = xcalloc (1, sizeof (gfc_unit)); u->unit_number = n; -#ifdef __GTHREAD_MUTEX_INIT - { -__gthread_mutex_t tmp = __GTHREAD_MUTEX_INIT; -u->lock = tmp; - } -#else - __GTHREAD_MUTEX_INIT_FUNCTION (>lock); -#endif - __gthread_mutex_lock (>lock); u->priority = pseudo_random (); unit_root = insert (u, unit_root); return u; @@ -361,9 +352,20 @@ retry: if (created) { - /* Newly created units have their lock held already - from insert_unit. Just unlock UNIT_LOCK and return. */ +#ifdef __GTHREAD_MUTEX_INIT + { + __gthread_mutex_t tmp = __GTHREAD_MUTEX_INIT; + p->lock = tmp; + } +#else + __GTHREAD_MUTEX_INIT_FUNCTION (>lock); +#endif __gthread_mutex_unlock (_lock); + + /* Nobody outside this address has seen this unit yet. We could safely + keep it unlocked until now. */ + + __gthread_mutex_lock (>lock); return p; } @@ -618,8 +620,6 @@ init_units (void) u->filename = strdup (stdin_name); fbuf_init (u, 0); - - __gthread_mutex_unlock (>lock); } if (options.stdout_unit >= 0) @@ -649,8 +649,6 @@ init_units (void) u->filename = strdup (stdout_name); fbuf_init (u, 0); - - __gthread_mutex_unlock (>lock); } if (options.stderr_unit >= 0) @@ -680,8 +678,6 @@ init_units (void) fbuf_init (u, 256); /* 256 bytes should be enough, probably not doing any kind of exotic formatting to stderr. */ - - __gthread_mutex_unlock (>lock); } /* Calculate the maximum file offset in a portable manner. @@ -719,7 +715,7 @@ close_unit_1 (gfc_unit *u, int locked) u->filename = NULL; free_format_hash_table (u); - fbuf_destroy (u); + fbuf_destroy (u, locked); if (u->unit_number <= NEWUNIT_START) newunit_free (u->unit_number); ! { dg-do run } ! { dg-require-effective-target fopenmp } ! { dg-additional-options "-fopenmp" } program main use omp_lib !$OMP PARALLEL NUM_THREADS(100) write (10,*) 'asdf' !$OMP END PARALLEL close(10,status="delete") end program main
[patch, libfortran] Fix thead sanitizer issue with libgfortran
Hello world, the attached patch fixes the problem reported in PR 66756: When opeing a file, the main lock for all units was acquired, the unit lock was acquired, and then the main lock was released and re-aquired. To the thread sanitizer, this is a lock-order inversion. One option would have been to simply close the bug, because this only occurs in opening a file, when the gfc_unit has not yet had a chance to escape to another thread. However, it appears that this causes trouble debugging parallel applications, hence this patch. What this patch does is to change the assumptions for insert_unit: Previously, this used to lock the newly created unit, and the caller had to unlock. Now, gfc_get_unit can do the locking after releasing the global lock. This gets rid of the thread sanitizer issue; the thread sanitizer output is clean. However, I would appreciate feedback about whether this approach (and my code) is correct. Regression-tested. Comments? Suggestions for improvements/other approaches? Close the PR as WONTFIX instead? OK for trunk? Regards Thomas 2017-09-28 Thomas KoenigPR fortran/66756 * io/fbuf.c (fbuf_destroy): Lock unit before freeing the buffer. * io/unit.c (insert_unit): Do not create lock and lock, move to (gfc_get_unit): here; lock after insert_unit has succeded. (init_units): Do not unlock unit locks for stdin, stdout and stderr. Index: io/fbuf.c === --- io/fbuf.c (Revision 253162) +++ io/fbuf.c (Arbeitskopie) @@ -50,9 +50,11 @@ fbuf_destroy (gfc_unit *u) { if (u->fbuf == NULL) return; + __gthread_mutex_lock (>lock); free (u->fbuf->buf); free (u->fbuf); u->fbuf = NULL; + __gthread_mutex_unlock (>lock); } Index: io/unit.c === --- io/unit.c (Revision 253162) +++ io/unit.c (Arbeitskopie) @@ -221,23 +221,14 @@ insert (gfc_unit *new, gfc_unit *t) return t; } +/* insert_unit()-- Create a new node, insert it into the treap. It is assumed + that the caller holds unit_lock. */ -/* insert_unit()-- Create a new node, insert it into the treap. */ - static gfc_unit * insert_unit (int n) { gfc_unit *u = xcalloc (1, sizeof (gfc_unit)); u->unit_number = n; -#ifdef __GTHREAD_MUTEX_INIT - { -__gthread_mutex_t tmp = __GTHREAD_MUTEX_INIT; -u->lock = tmp; - } -#else - __GTHREAD_MUTEX_INIT_FUNCTION (>lock); -#endif - __gthread_mutex_lock (>lock); u->priority = pseudo_random (); unit_root = insert (u, unit_root); return u; @@ -361,9 +352,20 @@ retry: if (created) { - /* Newly created units have their lock held already - from insert_unit. Just unlock UNIT_LOCK and return. */ +#ifdef __GTHREAD_MUTEX_INIT + { + __gthread_mutex_t tmp = __GTHREAD_MUTEX_INIT; + p->lock = tmp; + } +#else + __GTHREAD_MUTEX_INIT_FUNCTION (>lock); +#endif __gthread_mutex_unlock (_lock); + + /* Nobody outside this address has seen this unit yet. We could safely + keep it unlocked until now. */ + + __gthread_mutex_lock (>lock); return p; } @@ -618,8 +620,6 @@ init_units (void) u->filename = strdup (stdin_name); fbuf_init (u, 0); - - __gthread_mutex_unlock (>lock); } if (options.stdout_unit >= 0) @@ -649,8 +649,6 @@ init_units (void) u->filename = strdup (stdout_name); fbuf_init (u, 0); - - __gthread_mutex_unlock (>lock); } if (options.stderr_unit >= 0) @@ -680,8 +678,6 @@ init_units (void) fbuf_init (u, 256); /* 256 bytes should be enough, probably not doing any kind of exotic formatting to stderr. */ - - __gthread_mutex_unlock (>lock); } /* Calculate the maximum file offset in a portable manner.