[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

Paul Floyd  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|REOPENED|RESOLVED

--- Comment #20 from Paul Floyd  ---
Now fixed

commit 74e20007467b930fb14264519f74a45179b5f740 (HEAD -> master, origin/master,
origin/HEAD)
Author: Paul Floyd 
Date:   Tue Jan 3 21:28:42 2023 +0100

BugĀ 327548 - false positive while destroying mutex

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #19 from Samuel Thibault  ---
(ran it for an hour)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #18 from Samuel Thibault  ---
I have run the reproducer on all the various systems previously mentioned, with
no issue so far, so it seems we're good with that fix.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #17 from Samuel Thibault  ---
Ah, configure.ac already detects which case that is. So we need a valgrind
built against the proper glibc, that's why I hadn't the proper name.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #16 from Samuel Thibault  ---
> I would expect these to be covered by the default suppressions:
> {
>helgrind-glibc2X-004
>Helgrind:Race
>obj:*/lib*/libc-2.*so*
> }

Ah, but glibc renamed libc-2.*so* into just libc.so.6, so that suppression
won't take effect. Duplicating the entry with libc.so.* does indeed seem to
avoid the warnings. I'll try to test on more machines and archs.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #15 from Paul Floyd  ---
(In reply to Samuel Thibault from comment #12)
> Yes that avoids that exact issue, but then a flurry of new ones pop up, I
> was not getting them before:
> 
> ==2511392== Possible data race during read of size 4 at 0x10C128 by thread #2
> ==2511392== Locks held: none
> ==2511392==at 0x48F31A2: pthread_mutex_lock@@GLIBC_2.2.5
> (pthread_mutex_lock.c:94)

[snip]

I would expect these to be covered by the default suppressions:
{
   helgrind-glibc2X-004
   Helgrind:Race
   obj:*/lib*/libc-2.*so*
}

and

{
   helgrind-glibc2X-005
   Helgrind:Race
   obj:*/lib*/libpthread-2.*so*
}

which are very broad and cover most of the pthread internals.

I'll try to see if I can test on 2.36

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #14 from Samuel Thibault  ---
FYI, it's glibc 2.36:

- pthread_mutex_lock.c:94 is when it checks assert (mutex->__data.__owner ==
0); after obtaining the lock
- pthread_mutex_lock.c:182 is when it sets mutex->__data.__owner after
obtaining the lock
- pthread_mutex_unlock.c:62 is when it clears mutex->__data.__owner before
releasing the lock
- pthread_mutex_unlock.c:65 is when it --mutex->__data.__nusers; before
releasing the lock

The other cond-related lines are more complex to describe, but they'll probably
get fixed the same way as mutex would get fixed.

I guess the disabling of checking somehow perturbates helgrind's history record
(I guess that's part of the "This puts it in the same state as new memory
allocated by this thread -- that is, basically owned exclusively by this
thread." comment for the VALGRIND_HG_ENABLE_CHECKING macro)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #13 from Paul Floyd  ---
Looking at the glibc source

pthread_mutex_destroy is fairly simple. It does an atomic read of the kind and
a plain read of nusers and returns EBUSY if nusers is not 0.

It then sets kind to an invalid value, -1.

pthread_mutex_unlock is a lot more complicated. It also does atomic reads of
the kind, decrements nusers and checks the lock then does a cas on the lock.

If I run with --default-suppressions=no then there huge numbers of hazards, but
none related to pthread_mutex_destroy. I think that is because Helgrind sees
the lock order is such that pthread_mutex_destroy is called in a serial
section.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #12 from Samuel Thibault  ---
Yes that avoids that exact issue, but then a flurry of new ones pop up, I was
not getting them before:

==2511392== Possible data race during read of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==at 0x48F31A2: pthread_mutex_lock@@GLIBC_2.2.5
(pthread_mutex_lock.c:94)
==2511392==by 0x48487F8: mutex_lock_WRK (hg_intercepts.c:937)
==2511392==by 0x109272: f (in /net/inria/home/sthibault/test)
==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous write of size 4 by thread #1
==2511392== Locks held: none
==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt
(pthread_mutex_unlock.c:62)
==2511392==by 0x48EF26D: __pthread_cond_wait_common
(pthread_cond_wait.c:419)
==2511392==by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2
(pthread_cond_wait.c:618)
==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392== 
==2511392== 
==2511392== 
==2511392== Possible data race during write of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==at 0x48F31B9: pthread_mutex_lock@@GLIBC_2.2.5
(pthread_mutex_lock.c:182)
==2511392==by 0x48487F8: mutex_lock_WRK (hg_intercepts.c:937)
==2511392==by 0x109272: f (in /net/inria/home/sthibault/test)
==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous write of size 4 by thread #1
==2511392== Locks held: none
==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt
(pthread_mutex_unlock.c:62)
==2511392==by 0x48EF26D: __pthread_cond_wait_common
(pthread_cond_wait.c:419)
==2511392==by 0x48EF26D: pthread_cond_wait@@GLIBC_2.3.2
(pthread_cond_wait.c:618)
==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==2511392== 
==2511392== 
==2511392== 
==2511392==  Lock at 0x10C120 was first observed
==2511392==at 0x484CFE9: pthread_mutex_init (hg_intercepts.c:818)
==2511392==by 0x1092EE: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c120 is 0 bytes inside data symbol "mutex"
==2511392== 
==2511392== Possible data race during write of size 8 at 0x10C0E8 by thread #2
==2511392== Locks held: 1, at address 0x10C120
==2511392==at 0x48EEE29: __atomic_wide_counter_add_relaxed
(atomic_wide_counter.h:57)
==2511392==by 0x48EEE29: __condvar_add_g1_start_relaxed
(pthread_cond_common.c:52)
==2511392==by 0x48EEE29: __condvar_quiesce_and_switch_g1
(pthread_cond_common.c:294)
==2511392==by 0x48EEE29: pthread_cond_signal@@GLIBC_2.3.2
(pthread_cond_signal.c:77)
==2511392==by 0x4848FB8: pthread_cond_signal_WRK (hg_intercepts.c:1570)
==2511392==by 0x10928B: f (in /net/inria/home/sthibault/test)
==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous read of size 8 by thread #1
==2511392== Locks held: none
==2511392==at 0x48EF39E: __atomic_wide_counter_load_relaxed
(atomic_wide_counter.h:30)
==2511392==by 0x48EF39E: __condvar_load_g1_start_relaxed
(pthread_cond_common.c:46)
==2511392==by 0x48EF39E: __pthread_cond_wait_common
(pthread_cond_wait.c:486)
==2511392==by 0x48EF39E: pthread_cond_wait@@GLIBC_2.3.2
(pthread_cond_wait.c:618)
==2511392==by 0x484BB03: pthread_cond_wait_WRK (hg_intercepts.c:1291)
==2511392==by 0x10933B: main (in /net/inria/home/sthibault/test)
==2511392==  Address 0x10c0e8 is 8 bytes inside data symbol "cond"
==2511392== 
==2511392== 
==2511392== 
==2511392== Possible data race during write of size 4 at 0x10C128 by thread #2
==2511392== Locks held: none
==2511392==at 0x48F4A47: __pthread_mutex_unlock_usercnt
(pthread_mutex_unlock.c:62)
==2511392==by 0x4848DD8: mutex_unlock_WRK (hg_intercepts.c:1184)
==2511392==by 0x10929A: f (in /net/inria/home/sthibault/test)
==2511392==by 0x484B966: mythread_wrapper (hg_intercepts.c:406)
==2511392==by 0x48EFFD3: start_thread (pthread_create.c:442)
==2511392==by 0x496F8CF: clone (clone.S:100)
==2511392== 
==2511392== This conflicts with a previous write 

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #11 from Paul Floyd  ---
This is the same as what Philippe suggested in the e-mail thread, but with
similar protection for cond vars.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #10 from Paul Floyd  ---
Can you try this patch please?

diff --git a/helgrind/hg_intercepts.c b/helgrind/hg_intercepts.c
index 5a83996..8c98e4e 100644
--- a/helgrind/hg_intercepts.c
+++ b/helgrind/hg_intercepts.c
@@ -879,7 +879,9 @@ static int mutex_destroy_WRK(pthread_mutex_t *mutex)

if (mutex != NULL) {
   static const pthread_mutex_t mutex_init = PTHREAD_MUTEX_INITIALIZER;
+  VALGRIND_HG_DISABLE_CHECKING(mutex, sizeof(*mutex));
   mutex_is_init = my_memcmp(mutex, &mutex_init, sizeof(*mutex)) == 0;
+  VALGRIND_HG_ENABLE_CHECKING(mutex, sizeof(*mutex));
} else {
   mutex_is_init = 0;
}
@@ -1785,7 +1787,9 @@ static int pthread_cond_destroy_WRK(pthread_cond_t* cond)

if (cond != NULL) {
   const pthread_cond_t cond_init = PTHREAD_COND_INITIALIZER;
+  VALGRIND_HG_DISABLE_CHECKING(cond, sizeof(*cond));
   cond_is_init = my_memcmp(cond, &cond_init, sizeof(*cond)) == 0;
+  VALGRIND_HG_ENABLE_CHECKING(cond, sizeof(*cond));
} else {
  cond_is_init = 0;
}

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-03 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #9 from Paul Floyd  ---
I can reproduce this on a RHEL 7.6 machine.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-02 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #8 from Samuel Thibault  ---
(with the various versions of valgrind between 3.9 and 3.19)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2023-01-02 Thread Samuel Thibault
https://bugs.kde.org/show_bug.cgi?id=327548

Samuel Thibault  changed:

   What|Removed |Added

 Resolution|WORKSFORME  |---
 Ever confirmed|0   |1
 Status|RESOLVED|REOPENED

--- Comment #7 from Samuel Thibault  ---
To provide various datapoints, I have tried it 

valgrind --tool=helgrind ./test

on:

- Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
- Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
- Intel(R) Xeon(R) CPU E5-2650L v4 @ 1.70GHz
- AMD EPYC 7352 24-Core Processor
- AMD EPYC 7502 32-Core Processor

with glibc 2.35

I got the hit

==420467== Helgrind, a thread error detector
==420467== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
==420467== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==420467== Command: ./test
==420467== 
==420467== ---Thread-Announcement--
==420467== 
==420467== Thread #1 is the program's root thread
==420467== 
==420467== ---Thread-Announcement--
==420467== 
==420467== Thread #2 was created
==420467==at 0x49792FF: clone (clone.S:76)
==420467==by 0x497A146: __clone_internal (clone-internal.c:83)
==420467==by 0x48F6484: create_thread (pthread_create.c:295)
==420467==by 0x48F6F78: pthread_create@@GLIBC_2.34 (pthread_create.c:828)
==420467==by 0x484E5D7: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x1092DA: main (in /net/inria/home/sthibault/test)
==420467== 
==420467== 
==420467== 
==420467== Possible data race during read of size 1 at 0x10C128 by thread #1
==420467== Locks held: none
==420467==at 0x484B225: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x484B4CA: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x10937C: main (in /net/inria/home/sthibault/test)
==420467== 
==420467== This conflicts with a previous write of size 4 by thread #2
==420467== Locks held: none
==420467==at 0x48FB4D8: __pthread_mutex_unlock_usercnt
(pthread_mutex_unlock.c:62)
==420467==by 0x484BCE8: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x10929A: f (in /net/inria/home/sthibault/test)
==420467==by 0x484E7D6: ??? (in
/usr/libexec/valgrind/vgpreload_helgrind-amd64-linux.so)
==420467==by 0x48F6849: start_thread (pthread_create.c:442)
==420467==by 0x497930F: clone (clone.S:100)
==420467==  Address 0x10c128 is 8 bytes inside data symbol "mutex"
==420467== 

immediately on almost all of them, only AMD EPYC 7352 24-Core Processor took 1m
to reproduce.

On an AMD Opteron(tm) Processor 6174 with glibc 2.28 I couldn't reproduce it
within an hour.

On a Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz, I tried with various glibcs:
2.19, 2.24, 2.28, 2.31, 2.36, with all versions it happens within half an hour,
sometimes within a minute.

On an arm64 with glibc 2.36 it appeared immediately

On a mips64el Cavium Octeon III V0.2  FPU V0.0 with glibc 2.36 it appeared
immediately

On a ppc64el POWER8 with glibc 2.36 it appeared within a minute.

On an s390x it appeared in 8 minutes.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2022-12-31 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

Paul Floyd  changed:

   What|Removed |Added

 Status|REPORTED|RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #6 from Paul Floyd  ---
Reopen this if anyone else can reproduces the problem.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2022-12-30 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

--- Comment #5 from Paul Floyd  ---
I ran this for a couple of hours without any errors. It could be related to the
hardware.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 327548] false positive while destroying mutex

2022-12-27 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=327548

Paul Floyd  changed:

   What|Removed |Added

 CC||pjfl...@wanadoo.fr

--- Comment #4 from Paul Floyd  ---
I don't get an error on FreeBSD. I'll check if the problem still exists on
Linux.

-- 
You are receiving this mail because:
You are watching all bug changes.