[valgrind] [Bug 327548] false positive while destroying mutex
https://bugs.kde.org/show_bug.cgi?id=327548 --- Comment #19 from Samuel Thibault --- (ran it for an hour) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 327548] false positive while destroying mutex
https://bugs.kde.org/show_bug.cgi?id=327548 --- Comment #18 from Samuel Thibault --- I have run the reproducer on all the various systems previously mentioned, with no issue so far, so it seems we're good with that fix. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 327548] false positive while destroying mutex
https://bugs.kde.org/show_bug.cgi?id=327548 --- Comment #17 from Samuel Thibault --- Ah, configure.ac already detects which case that is. So we need a valgrind built against the proper glibc, that's why I hadn't the proper name. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 327548] false positive while destroying mutex
https://bugs.kde.org/show_bug.cgi?id=327548 --- Comment #16 from Samuel Thibault --- > I would expect these to be covered by the default suppressions: > { >helgrind-glibc2X-004 >Helgrind:Race >obj:*/lib*/libc-2.*so* > } Ah, but glibc renamed libc-2.*so* into just libc.so.6, so that suppression won't take effect. Duplicating the entry with libc.so.* does indeed seem to avoid the warnings. I'll try to test on more machines and archs. -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 327548] false positive while destroying mutex
https://bugs.kde.org/show_bug.cgi?id=327548 --- Comment #14 from Samuel Thibault --- FYI, it's glibc 2.36: - pthread_mutex_lock.c:94 is when it checks assert (mutex->__data.__owner == 0); after obtaining the lock - pthread_mutex_lock.c:182 is when it sets mutex->__data.__owner after obtaining the lock - pthread_mutex_unlock.c:62 is when it clears mutex->__data.__owner before releasing the lock - pthread_mutex_unlock.c:65 is when it --mutex->__data.__nusers; before releasing the lock The other cond-related lines are more complex to describe, but they'll probably get fixed the same way as mutex would get fixed. I guess the disabling of checking somehow perturbates helgrind's history record (I guess that's part of the "This puts it in the same state as new memory allocated by this thread -- that is, basically owned exclusively by this thread." comment for the VALGRIND_HG_ENABLE_CHECKING macro) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 327548] false positive while destroying mutex
https://bugs.kde.org/show_bug.cgi?id=327548 --- Comment #12 from Samuel Thibault --- Yes that avoids that exact issue, but then a flurry of new ones pop up, I was not getting them before: ==2511392== Possible data race during read of size 4 at 0x10C128 by thread #2 ==2511392== Locks held: none ==2511392==at 0x48F31A2: pthread_mutex_lock@@GLIBC_2.2.5 (pthread_mutex_lock.c:94) ==2511392==by 0x48487F8: mutex_lock_WRK (hg_intercepts.c:937) ==2511392==by 0x109272: f (in /net/inria/home/sthibault/test) ==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406) ==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442) ==2511392==by 0x496F8CF: clone (clone.S:100) ==2511392== ==2511392== This conflicts with a previous write of size 4 by thread #1 ==2511392== Locks held: none ==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62) ==2511392==by 0x48EF26D: __pthread_cond_wait_common (pthread_cond_wait.c:419) ==2511392==by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618) ==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291) ==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test) ==2511392== Address 0x10c128 is 8 bytes inside data symbol "mutex" ==2511392== ==2511392== ==2511392== ==2511392== Possible data race during write of size 4 at 0x10C128 by thread #2 ==2511392== Locks held: none ==2511392==at 0x48F31B9: pthread_mutex_lock@@GLIBC_2.2.5 (pthread_mutex_lock.c:182) ==2511392==by 0x48487F8: mutex_lock_WRK (hg_intercepts.c:937) ==2511392==by 0x109272: f (in /net/inria/home/sthibault/test) ==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406) ==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442) ==2511392==by 0x496F8CF: clone (clone.S:100) ==2511392== ==2511392== This conflicts with a previous write of size 4 by thread #1 ==2511392== Locks held: none ==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62) ==2511392==by 0x48EF26D: __pthread_cond_wait_common (pthread_cond_wait.c:419) ==2511392==by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618) ==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291) ==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test) ==2511392== Address 0x10c128 is 8 bytes inside data symbol "mutex" ==2511392== ==2511392== ==2511392== ==2511392== Lock at 0x10C120 was first observed ==2511392==at 0x484CFE9: pthread_mutex_init (hg_intercepts.c:818) ==2511392==by 0x1092EE: main (in /net/inria/home/sthibault/test) ==2511392== Address 0x10c120 is 0 bytes inside data symbol "mutex" ==2511392== ==2511392== Possible data race during write of size 8 at 0x10C0E8 by thread #2 ==2511392== Locks held: 1, at address 0x10C120 ==2511392==at 0x48EEE29: __atomic_wide_counter_add_relaxed (atomic_wide_counter.h:57) ==2511392==by 0x48EEE29: __condvar_add_g1_start_relaxed (pthread_cond_common.c:52) ==2511392==by 0x48EEE29: __condvar_quiesce_and_switch_g1 (pthread_cond_common.c:294) ==2511392==by 0x48EEE29: pthread_cond_signal@@GLIBC_2.3.2 (pthread_cond_signal.c:77) ==2511392==by 0x4848FB8: pthread_cond_signal_WRK (hg_intercepts.c:1570) ==2511392==by 0x10928B: f (in /net/inria/home/sthibault/test) ==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406) ==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442) ==2511392==by 0x496F8CF: clone (clone.S:100) ==2511392== ==2511392== This conflicts with a previous read of size 8 by thread #1 ==2511392== Locks held: none ==2511392==at 0x48EF39E: __atomic_wide_counter_load_relaxed (atomic_wide_counter.h:30) ==2511392==by 0x48EF39E: __condvar_load_g1_start_relaxed (pthread_cond_common.c:46) ==2511392==by 0x48EF39E: __pthread_cond_wait_common (pthread_cond_wait.c:486) ==2511392==by 0x48EF39E: pthread_cond_wait@@GLIBC_2.3.2 (pthread_cond_wait.c:618) ==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291) ==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test) ==2511392== Address 0x10c0e8 is 8 bytes inside data symbol "cond" ==2511392== ==2511392== ==2511392== ==2511392== Possible data race during write of size 4 at 0x10C128 by thread #2 ==2511392== Locks held: none ==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62) ==2511392==by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184) ==2511392==by 0x10929A: f (in /net/inria/home/sthibault/test) ==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406) ==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442) ==2511392==by 0x496F8CF: clone (clone.S:100) ==2511392== ==251139
[valgrind] [Bug 327548] false positive while destroying mutex
https://bugs.kde.org/show_bug.cgi?id=327548 --- Comment #8 from Samuel Thibault --- (with the various versions of valgrind between 3.9 and 3.19) -- You are receiving this mail because: You are watching all bug changes.
[valgrind] [Bug 327548] false positive while destroying mutex
https://bugs.kde.org/show_bug.cgi?id=327548 Samuel Thibault changed: What|Removed |Added Resolution|WORKSFORME |--- Ever confirmed|0 |1 Status|RESOLVED|REOPENED --- Comment #7 from Samuel Thibault --- To provide various datapoints, I have tried it valgrind --tool=helgrind ./test on: - Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz - Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz - Intel(R) Xeon(R) CPU E5-2650L v4 @ 1.70GHz - AMD EPYC 7352 24-Core Processor - AMD EPYC 7502 32-Core Processor with glibc 2.35 I got the hit ==420467== Helgrind, a thread error detector ==420467== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al. ==420467== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==420467== Command: ./test ==420467== ==420467== ---Thread-Announcement-- ==420467== ==420467== Thread #1 is the program's root thread ==420467== ==420467== ---Thread-Announcement-- ==420467== ==420467== Thread #2 was created ==420467==at 0x49792FF: clone (clone.S:76) ==420467==by 0x497A146: __clone_internal (clone-internal.c:83) ==420467==by 0x48F6484: create_thread (pthread_create.c:295) ==420467==by 0x48F6F78: pthread_create@@GLIBC_2.34 (pthread_create.c:828) ==420467==by 0x484E5D7: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==420467==by 0x1092DA: main (in /net/inria/home/sthibault/test) ==420467== ==420467== ==420467== ==420467== Possible data race during read of size 1 at 0x10C128 by thread #1 ==420467== Locks held: none ==420467==at 0x484B225: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==420467==by 0x484B4CA: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==420467==by 0x10937C: main (in /net/inria/home/sthibault/test) ==420467== ==420467== This conflicts with a previous write of size 4 by thread #2 ==420467== Locks held: none ==420467==at 0x48FB4D8: __pthread_mutex_unlock_usercnt (pthread_mutex_unlock.c:62) ==420467==by 0x484BCE8: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==420467==by 0x10929A: f (in /net/inria/home/sthibault/test) ==420467==by 0x484E7D6: ??? (in /usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so) ==420467==by 0x48F6849: start_thread (pthread_create.c:442) ==420467==by 0x497930F: clone (clone.S:100) ==420467== Address 0x10c128 is 8 bytes inside data symbol "mutex" ==420467== immediately on almost all of them, only AMD EPYC 7352 24-Core Processor took 1m to reproduce. On an AMD Opteron(tm) Processor 6174 with glibc 2.28 I couldn't reproduce it within an hour. On a Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, I tried with various glibcs: 2.19, 2.24, 2.28, 2.31, 2.36, with all versions it happens within half an hour, sometimes within a minute. On an arm64 with glibc 2.36 it appeared immediately On a mips64el Cavium Octeon III V0.2 FPU V0.0 with glibc 2.36 it appeared immediately On a ppc64el POWER8 with glibc 2.36 it appeared within a minute. On an s390x it appeared in 8 minutes. -- You are receiving this mail because: You are watching all bug changes.