[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #19 from Samuel Thibault  ---
(ran it for an hour)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #18 from Samuel Thibault  ---
I have run the reproducer on all the various systems previously mentioned, with
no issue so far, so it seems we're good with that fix.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #17 from Samuel Thibault  ---
Ah, configure.ac already detects which case that is. So we need a valgrind
built against the proper glibc, that's why I hadn't the proper name.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #16 from Samuel Thibault  ---
> I would expect these to be covered by the default suppressions:
> {
>helgrind-glibc2X-004
>Helgrind:Race
>obj:*/lib*/libc-2.*so*
> }

Ah, but glibc renamed libc-2.*so* into just libc.so.6, so that suppression
won't take effect. Duplicating the entry with libc.so.* does indeed seem to
avoid the warnings. I'll try to test on more machines and archs.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #14 from Samuel Thibault  ---
FYI, it's glibc 2.36:

- pthread_mutex_lock.c:94 is when it checks assert (mutex->__data.__owner ==
0); after obtaining the lock
- pthread_mutex_lock.c:182 is when it sets mutex->__data.__owner after
obtaining the lock
- pthread_mutex_unlock.c:62 is when it clears mutex->__data.__owner before
releasing the lock
- pthread_mutex_unlock.c:65 is when it --mutex->__data.__nusers; before
releasing the lock

The other cond-related lines are more complex to describe, but they'll probably
get fixed the same way as mutex would get fixed.

I guess the disabling of checking somehow perturbates helgrind's history record
(I guess that's part of the "This puts it in the same state as new memory
allocated by this thread -- that is, basically owned exclusively by this
thread." comment for the VALGRIND_HG_ENABLE_CHECKING macro)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #12 from Samuel Thibault  ---
Yes that avoids that exact issue, but then a flurry of new ones pop up, I was
not getting them before:

==2511392== Possible data race during read of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==at 0x48F31A2: pthread_mutex_lock@@GLIBC_2.2.5
(pthread_mutex_lock.c:94)
==2511392==by 0x48487F8: mutex_lock_WRK (hg_intercepts.c:937)
==2511392==by 0x109272: f (in /net/inria/home/sthibault/test)
==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous write of size 4 by thread #1
==2511392== Locks held: none
==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt
(pthread_mutex_unlock.c:62)
==2511392==by 0x48EF26D: __pthread_cond_wait_common
(pthread_cond_wait.c:419)
==2511392==by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2
(pthread_cond_wait.c:618)
==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392== 
==2511392== 
==2511392== 
==2511392== Possible data race during write of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==at 0x48F31B9: pthread_mutex_lock@@GLIBC_2.2.5
(pthread_mutex_lock.c:182)
==2511392==by 0x48487F8: mutex_lock_WRK (hg_intercepts.c:937)
==2511392==by 0x109272: f (in /net/inria/home/sthibault/test)
==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous write of size 4 by thread #1
==2511392== Locks held: none
==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt
(pthread_mutex_unlock.c:62)
==2511392==by 0x48EF26D: __pthread_cond_wait_common
(pthread_cond_wait.c:419)
==2511392==by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2
(pthread_cond_wait.c:618)
==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392== 
==2511392== 
==2511392== 
==2511392==  Lock at 0x10C120 was first observed
==2511392==at 0x484CFE9: pthread_mutex_init (hg_intercepts.c:818)
==2511392==by 0x1092EE: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c120 is 0 bytes inside data symbol "mutex"
==2511392== 
==2511392== Possible data race during write of size 8 at 0x10C0E8 by thread #2
==2511392== Locks held: 1, at address 0x10C120
==2511392==at 0x48EEE29: __atomic_wide_counter_add_relaxed
(atomic_wide_counter.h:57)
==2511392==by 0x48EEE29: __condvar_add_g1_start_relaxed
(pthread_cond_common.c:52)
==2511392==by 0x48EEE29: __condvar_quiesce_and_switch_g1
(pthread_cond_common.c:294)
==2511392==by 0x48EEE29: pthread_cond_signal@@GLIBC_2.3.2
(pthread_cond_signal.c:77)
==2511392==by 0x4848FB8: pthread_cond_signal_WRK (hg_intercepts.c:1570)
==2511392==by 0x10928B: f (in /net/inria/home/sthibault/test)
==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous read of size 8 by thread #1
==2511392== Locks held: none
==2511392==at 0x48EF39E: __atomic_wide_counter_load_relaxed
(atomic_wide_counter.h:30)
==2511392==by 0x48EF39E: __condvar_load_g1_start_relaxed
(pthread_cond_common.c:46)
==2511392==by 0x48EF39E: __pthread_cond_wait_common
(pthread_cond_wait.c:486)
==2511392==by 0x48EF39E: pthread_cond_wait@@GLIBC_2.3.2
(pthread_cond_wait.c:618)
==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c0e8 is 8 bytes inside data symbol "cond"
==2511392== 
==2511392== 
==2511392== 
==2511392== Possible data race during write of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt
(pthread_mutex_unlock.c:62)
==2511392==by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==by 0x10929A: f (in /net/inria/home/sthibault/test)
==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==by 0x496F8CF: clone (clone.S:100)
==2511392== 
==251139

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-02 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #8 from Samuel Thibault  ---
(with the various versions of valgrind between 3.9 and 3.19)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-02 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

Samuel Thibault  changed:

   What|Removed |Added

 Resolution|WORKSFORME  |---
 Ever confirmed|0   |1
 Status|RESOLVED|REOPENED

--- Comment #7 from Samuel Thibault  ---
To provide various datapoints, I have tried it 

valgrind --tool=helgrind ./test

on:

- Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
- Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
- Intel(R) Xeon(R) CPU E5-2650L v4 @ 1.70GHz
- AMD EPYC 7352 24-Core Processor
- AMD EPYC 7502 32-Core Processor

with glibc 2.35

I got the hit

==420467== Helgrind, a thread error detector
==420467== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==420467== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==420467== Command: ./test
==420467== 
==420467== ---Thread-Announcement--
==420467== 
==420467== Thread #1 is the program's root thread
==420467== 
==420467== ---Thread-Announcement--
==420467== 
==420467== Thread #2 was created
==420467==at 0x49792FF: clone (clone.S:76)
==420467==by 0x497A146: __clone_internal (clone-internal.c:83)
==420467==by 0x48F6484: create_thread (pthread_create.c:295)
==420467==by 0x48F6F78: pthread_create@@GLIBC_2.34 (pthread_create.c:828)
==420467==by 0x484E5D7: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x1092DA: main (in /net/inria/home/sthibault/test)
==420467== 
==420467== 
==420467== 
==420467== Possible data race during read of size 1 at 0x10C128 by thread #1
==420467== Locks held: none
==420467==at 0x484B225: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x484B4CA: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x10937C: main (in /net/inria/home/sthibault/test)
==420467== 
==420467== This conflicts with a previous write of size 4 by thread #2
==420467== Locks held: none
==420467==at 0x48FB4D8: __pthread_mutex_unlock_usercnt
(pthread_mutex_unlock.c:62)
==420467==by 0x484BCE8: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x10929A: f (in /net/inria/home/sthibault/test)
==420467==by 0x484E7D6: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x48F6849: start_thread (pthread_create.c:442)
==420467==by 0x497930F: clone (clone.S:100)
==420467==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==420467== 

immediately on almost all of them, only AMD EPYC 7352 24-Core Processor took 1m
to reproduce.

On an AMD Opteron(tm) Processor 6174 with glibc 2.28 I couldn't reproduce it
within an hour.

On a Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, I tried with various glibcs:
2.19, 2.24, 2.28, 2.31, 2.36, with all versions it happens within half an hour,
sometimes within a minute.

On an arm64 with glibc 2.36 it appeared immediately

On a mips64el Cavium Octeon III V0.2  FPU V0.0 with glibc 2.36 it appeared
immediately

On a ppc64el POWER8 with glibc 2.36 it appeared within a minute.

On an s390x it appeared in 8 minutes.

-- 
You are receiving this mail because:
You are watching all bug changes.