Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-23 Thread ISHIKAWA,chiaki

John, Paul, and Mark

Thank you for the information.

Debian is a bit slow in updating tools. It is very conservative.
Eventually, I obtained the valgrind git code.
(Debian is a bit slow in updating tools. It is very conservative.)

It contained the following.

#if defined(VGO_linux)
 STRNCMP(VG_Z_LIBC_SONAME, strncmp)
 STRNCMP(VG_Z_LIBC_SONAME, __GI_strncmp)
 STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse2)
 STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse42)
 STRNCMP(VG_Z_LD_LINUX_SO_2, strncmp)                    <---
 STRNCMP(VG_Z_LD_LINUX_X86_64_SO_2, strncmp)  <---

#elif defined(VGO_freebsd)

For now, with this version, I no longer get the warning for strncmp.

As I looked for the false-positive warning in the new log,
I have caught a real issue of my patch for thunderbird mail client.
It was caused by slowdown by valgrind.
This was not quite intentional, but it is surely helpful to simulate 
abnormal condition

to trigger unforeseen uncaught error situations.

The test is still running.
Hopefully no more error related issues.

BTW, it seemed the timing of valgrind has changed from 18.0.
$  valgrind --version
valgrind-3.20.0.GIT

I mean valgrind may take a bit longer ? to simulate the program 
execution. (I am not sure. All I can is
the elapsed time seems different. It may be related to the fact that 
false positive tracedump for strncmp is not printed any more, etc.)

I probably need to tweak time out values during the test.
It could be that I was not testing many smaller tests due to time out 
but did not realize it because

I was focused on real memory errors reported by valgrind.

Thank you again.

Chiaki


On 2022/05/21 18:42, John Reiser wrote:
I sent a log of redirect information to both Paul and John since the 
log was too large was mailing list.


I wonder what would be the preferred public sharing site for such a 
purpose these days.


The preferred way is to create a bug report, attach the large file to 
the bug report,

then post the URL of the bug report in a message to the mailing list.

Begin at  https://valgrind.org/ .  In the left nav, click on "Bug 
Reports", and follow

the directions on the resulting page.



143:39.43 GECKO(115765) ==115769== Invalid read of size 8
143:39.64 GECKO(115765) ==115769==    at 0x4021BF4: strncmp 
(strcmp.S:175)
143:39.64 GECKO(115765) ==115769==    by 0x400655D: is_dst 
(dl-load.c:214) 


This indicates that 'strncmp' should be re-directed from 
ld-linux-x86-64.so.2:

=
diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c
index 3b42b3a87..8272a3ae7 100644
--- a/shared/vg_replace_strmem.c
+++ b/shared/vg_replace_strmem.c
@@ -710,6 +710,7 @@ static inline void my_exit ( int x )
  STRNCMP(VG_Z_LIBC_SONAME, __GI_strncmp)
  STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse2)
  STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse42)
+ STRNCMP("ld-linux*.so*", strncmp)

 #elif defined(VGO_freebsd)
  STRNCMP(VG_Z_LIBC_SONAME, strncmp)
=
For instance, such a change is relevant to glibc-2.33-21.fc34.x86_64:
$ readelf --all /lib64/ld-linux-x86-64.so.2 | grep strncmp
  1706: 00022d30  6233 FUNC    LOCAL  DEFAULT   13 strncmp
$


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-21 Thread Mark Wielaard
Hi,

On Sat, May 21, 2022 at 02:42:22AM -0700, John Reiser wrote:
> > 143:39.43 GECKO(115765) ==115769== Invalid read of size 8
> > 143:39.64 GECKO(115765) ==115769==at 0x4021BF4: strncmp (strcmp.S:175)
> > 143:39.64 GECKO(115765) ==115769==by 0x400655D: is_dst
> > (dl-load.c:214)
> 
> This indicates that 'strncmp' should be re-directed from ld-linux-x86-64.so.2:
> =
> diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c
> index 3b42b3a87..8272a3ae7 100644
> --- a/shared/vg_replace_strmem.c
> +++ b/shared/vg_replace_strmem.c
> @@ -710,6 +710,7 @@ static inline void my_exit ( int x )
>   STRNCMP(VG_Z_LIBC_SONAME, __GI_strncmp)
>   STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse2)
>   STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse42)
> + STRNCMP("ld-linux*.so*", strncmp)
> 
>  #elif defined(VGO_freebsd)
>   STRNCMP(VG_Z_LIBC_SONAME, strncmp)

This looks like https://bugs.kde.org/show_bug.cgi?id=434764
iconv_open causes ld.so v2.28+ to use optimised strncmp

Which was recently (but after 3.19.0) fixed by merging this commit:

commit 947388eb043ea1c44b37df94046e1eee790ad776
Author: Mike Crowe 
AuthorDate: Mon Sep 9 14:16:16 2019 +0100
Commit: Mark Wielaard 
CommitDate: Sat May 14 00:41:18 2022 +0200

Intercept strncmp for glibc ld.so v2.28+

In glibc 5aad5f617892e75d91d4c8fb7594ff35b610c042 (first released in
v2.28) a call to strncmp was added to dl-load.c:is_dst. This causes
valgrind to complain about glibc's highly-optimised strncmp performing
sixteen-byte reads on short strings in ld.so. Let's intercept strncmp in
ld.so too so we use valgrind's simple version to avoid this problem.

Cheers,

Mark



___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-21 Thread John Reiser

This looks like https://bugs.kde.org/show_bug.cgi?id=434764
iconv_open causes ld.so v2.28+ to use optimised strncmp

Which was recently (but after 3.19.0) fixed by merging this commit:

commit 947388eb043ea1c44b37df94046e1eee790ad776
Author: Mike Crowe 
AuthorDate: Mon Sep 9 14:16:16 2019 +0100
Commit: Mark Wielaard 
CommitDate: Sat May 14 00:41:18 2022 +0200

 Intercept strncmp for glibc ld.so v2.28+
 


The C standard claims that all function names that begin with
'str' or 'mem' are reserved, and that language processors (compilers,
valgrind, ...)  may assume that such names designate the Standard functions.
If so, then valgrind's interception code is too picky about the context
for re-directing these functions; just use
 STRNCMP("*", strncmp)
etc. to re-direct them all.




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-21 Thread John Reiser

This indicates that 'strncmp' should be re-directed from ld-linux-x86-64.so.2:


That was done already in valgrind-3.19.0.  The user problem was reported 
against:
-
==4295== Memcheck, a memory error detector
==4295== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==4295== Using Valgrind-3.18.1-42b08ed5bd-20211015 and LibVEX; rerun with -h 
for copyright info
-
which probably does not contain the fix:
$ git blame shared/vg_replace_strmem.c
947388eb04 (Mike Crowe  2019-09-09 14:16:16 +0100  713)  
STRNCMP(VG_Z_LD_LINUX_SO_2, strncmp)
947388eb04 (Mike Crowe  2019-09-09 14:16:16 +0100  714)  
STRNCMP(VG_Z_LD_LINUX_X86_64_SO_2, strncmp)


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-21 Thread John Reiser

I sent a log of redirect information to both Paul and John since the log was 
too large was mailing list.

I wonder what would be the preferred public sharing site for such a purpose 
these days.


The preferred way is to create a bug report, attach the large file to the bug 
report,
then post the URL of the bug report in a message to the mailing list.

Begin at  https://valgrind.org/ .  In the left nav, click on "Bug Reports", and 
follow
the directions on the resulting page.



143:39.43 GECKO(115765) ==115769== Invalid read of size 8
143:39.64 GECKO(115765) ==115769==at 0x4021BF4: strncmp (strcmp.S:175)
143:39.64 GECKO(115765) ==115769==by 0x400655D: is_dst (dl-load.c:214) 


This indicates that 'strncmp' should be re-directed from ld-linux-x86-64.so.2:
=
diff --git a/shared/vg_replace_strmem.c b/shared/vg_replace_strmem.c
index 3b42b3a87..8272a3ae7 100644
--- a/shared/vg_replace_strmem.c
+++ b/shared/vg_replace_strmem.c
@@ -710,6 +710,7 @@ static inline void my_exit ( int x )
  STRNCMP(VG_Z_LIBC_SONAME, __GI_strncmp)
  STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse2)
  STRNCMP(VG_Z_LIBC_SONAME, __strncmp_sse42)
+ STRNCMP("ld-linux*.so*", strncmp)

 #elif defined(VGO_freebsd)
  STRNCMP(VG_Z_LIBC_SONAME, strncmp)
=
For instance, such a change is relevant to glibc-2.33-21.fc34.x86_64:
$ readelf --all /lib64/ld-linux-x86-64.so.2 | grep strncmp
  1706: 00022d30  6233 FUNCLOCAL  DEFAULT   13 strncmp
$


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-20 Thread ISHIKAWA,chiaki

Hi,

I sent a log of redirect information to both Paul and John since the log 
was too large was mailing list.


I wonder what would be the preferred public sharing site for such a 
purpose these days.


TIA

Chiaki

On 2022/05/21 0:57, John Reiser wrote:
(Wait, I see  "279:13.65 GECKO(392456) ==392459==    by 0x488D2D3: 
dlopen@@GLIBC_2.2.5 (dlopen.c:87)"

Version 2.2.5 is not the same as the version reported for glibc. Hmm? )


The "@@GLIBC_2.2.5" is the linking symbol version assigned by glibc.
This effectively is an ABI version, and the ABI for dlopen
has not changed for many years, even though other parts of
glibc have changed; one recent release is glibc-2.33.

The real key to Chiaki's problem is:

279:13.65 GECKO(392456) ==392459== Invalid read of size 8
279:13.65 GECKO(392456) ==392459==    at 0x4021BF4: strncmp 
(strcmp.S:175) 

which says that this 'strncmp' was not re-directed by valgrind.
Re-running valgrind with the additional command-line parameter
"--trace-redir=yes" will help provide more information.
Probably the run can be stopped after the first actual dlopen,
because that should be enough to trigger all the redirections
that matter here.


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-20 Thread John Reiser

(Wait, I see  "279:13.65 GECKO(392456) ==392459==    by 0x488D2D3: 
dlopen@@GLIBC_2.2.5 (dlopen.c:87)"
Version 2.2.5 is not the same as the version reported for glibc. Hmm? )


The "@@GLIBC_2.2.5" is the linking symbol version assigned by glibc.
This effectively is an ABI version, and the ABI for dlopen
has not changed for many years, even though other parts of
glibc have changed; one recent release is glibc-2.33.

The real key to Chiaki's problem is:

279:13.65 GECKO(392456) ==392459== Invalid read of size 8
279:13.65 GECKO(392456) ==392459==at 0x4021BF4: strncmp (strcmp.S:175) 

which says that this 'strncmp' was not re-directed by valgrind.
Re-running valgrind with the additional command-line parameter
"--trace-redir=yes" will help provide more information.
Probably the run can be stopped after the first actual dlopen,
because that should be enough to trigger all the redirections
that matter here.


___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-20 Thread ISHIKAWA,chiaki

Dear Paul,


On 2022/05/20 16:58, Floyd, Paul wrote:

Hi Chiaki

Debugging redirection issues isn't normally too slow. Redirection is 
done when Valgrind loads the guest executable and libraries.


Run Valgrind with --trace-redir=yes and you should see Valgrind 
printing what it finds in


 * ld.so, the link loader
 * the client executable
 * the valgrind tool
 * the valgrind shared lib preloads (core and tool)
 * any client shared libraries

libc falls under the last category, though there are a small number of 
C functions in the link loader (memcpy, strcmp etc).


You should see things like

--830--  ld-linux-x86-64.so.2 strcmp RL-> (2016.0) 0x040343b0
--830--  libc.so* __strcmp_sse42 RL-> (2016.0) 
0x04034370
--830--  libc.so* __strcmp_sse2  RL-> (2016.0) 
0x04034330
--830--  libc.so* __GI_strcmp    RL-> (2016.0) 
0x040342f0


If you don't see any symbols being redirected then you have a problem.


A+

Paul



I collected the version number info and have been running TB test suite 
under valgrind since this morning.

That was before I read this e-mail.

I will give the version number below first and see if I can run valgrind 
to obtain the redirection information.
(The thing is the already running valgrind+thunderbird is stretching my 
16GB memory linux image and I am not sure if I can start another 
instance of valgrind+thunderbird, or I need to bite the bullet and 
cancel the current run. I am afraid that the test takes close to a full 
day...)
Anyway, let me first send this version info, and I will check to see if 
I can obtain the redirection info easily.



Obviously, I don't seem to have the redirected symbol for strncpy in the 
trace.  That is for sure.

I do see redirection for malloc.
279:13.66 GECKO(392456) ==392459==    at 0x483F7B5: malloc 
(vg_replace_malloc.c:381)


--- version info ---

Hi,

Before I can figure out how to create a short reproducer, here is the 
version info

I collected.

[] Debian Version
ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ uname -a
Linux ip030 5.17.0-1-amd64 #1 SMP PREEMPT Debian 5.17.3-1 (2022-04-18) 
x86_64 GNU/Linux


[gcc-10] Used compiler. I just re-compiled the source tree using this 
compiler and still get the same error (trace attached at the end.)


Maybe I should use a newer version, but thunderbird mail client heavily 
relies on mozilla source code, and

newer version may encounter a compiler issues (warning or worse).

ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ gcc-10 --version
gcc-10 (Debian 10.3.0-15) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[glibc-a] As for glibc: I was not sure how to check for the version, but 
here it is.
ldd --version and running libc.so as a program was something I never 
realized we could (!)


ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ ldd 
/NEW-SSD/moz-obj-dir/objdir-tb3/dist/bin/thunderbird

    linux-vdso.so.1 (0x7fffa31ae000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x7f4403b64000)
    /lib64/ld-linux-x86-64.so.2 (0x7f4403d5d000)

[glibc-b]  ldd --version reports:

ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ ldd --version
ldd (Debian GLIBC 2.33-7) 2.33
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

[glibc-c] I did not realize that we can "run" GLIBC libc.so file this 
way to obtain glibc

version number.

The above info all points to Debian GLIBC 2.33-7
ishikawa@ip030:/NEW-SSD/NREF-COMM-CENTRAL/work-dir$ 
/lib/x86_64-linux-gnu/libc.so.6

GNU C Library (Debian GLIBC 2.33-7) release release version 2.33.
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 10.3.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
For bug reporting instructions, please see:
.

[] Version of valgrind:

   valgrind --version
   valgrind-3.18.1

(Well, I was quite upset when I initially realized I was using 
valgrind-3.18.0.GIT which I installed last September,

but I then verified that the bug appears with the current release, too.)

[Source code] mozilla comm-central source version version is:
I have added a few local mods but they don't touch the affected
version.

changeset:   35764:90328ce5bee2
tag: qparent
fxtree:  comm
user:    John Bieling 
date:    Wed May 18 13:13:33 2022 +0300
summary: Bug 1732554 - Make GenericSendMessage async. r=mkmelin

changeset:   35763:74a4091d1c27

[Source code] mozilla mozilla-central source version is:
Again, I have 

Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-20 Thread Floyd, Paul

Hi Chiaki

Debugging redirection issues isn't normally too slow. Redirection is 
done when Valgrind loads the guest executable and libraries.


Run Valgrind with --trace-redir=yes and you should see Valgrind printing 
what it finds in


 * ld.so, the link loader
 * the client executable
 * the valgrind tool
 * the valgrind shared lib preloads (core and tool)
 * any client shared libraries

libc falls under the last category, though there are a small number of C 
functions in the link loader (memcpy, strcmp etc).


You should see things like

--830--  ld-linux-x86-64.so.2 strcmp RL-> 
(2016.0) 0x040343b0
--830--  libc.so* __strcmp_sse42 RL-> (2016.0) 
0x04034370
--830--  libc.so* __strcmp_sse2  RL-> (2016.0) 
0x04034330
--830--  libc.so* __GI_strcmp    RL-> (2016.0) 
0x040342f0


If you don't see any symbols being redirected then you have a problem.


A+

Paul



___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-20 Thread ISHIKAWA,chiaki

Dear Paul,

Thank you for your e-mail and the lucid explanation.

I am sorry that I could not write to you earlier.
There was something wrong with my PC hardware and it took me quite a 
while to re-install many software products I regularly use.


I will try to create a short  sample. (The whole thunderbird software is 
a gigantic problem.) But it may be difficult
since the source code is large and if the compiler's code generation is 
history-sensitive, the problem may not be easy to re-create.


I will also check on the versions of tools that was used when the 
problem was noticed.

Let me have a couple of hours to check the versions.

BTW, now I vaguely recall that there was an issue with DL-library 
released many years ago by Debian regarding the
symbols for strcpy and friends. I can't recall the details now, but in 
that instance, the lack of proper debug symbols made the
re-direction difficult(?)  If my hazy memory is correct, the today's 
case may be influenced by a similar issue, but I better collect the 
versions so that someone in the know can experiment on their ends.
Back then, I think I created a wrapper that introduces the symbols for 
strcmp and friends. But that was many years ago.


TIA

Chiaki

PS: For those curious enough to know the hardware issue, I wanted to 
replace my Ryzen 1700 CPU with 16MB of L3 cache with

Ryzen 3700x  with 32GB of cache, solely because I learned that
larger the cache, the valgrind running big program like thunderbird mail 
client would fare better.
After a few years of use of 1700, I suspect the CPU is the limiting 
factor. I like it.: it uses much less power than many other modern CPUs. 
So it runs cool, and the PC is very silent without noisy fans.
Unfortunately, when I replaced the CPUs after carefully checking BIOS 
version, etc. to make sure the CPU would run on the motherboard (yes, it 
did. It runs linux without an issue at all,),
somehow Windows 10 Pro hosting my virtualbox running linux did not boot 
any more after the replacement and

trashed my boot environment. Aargh.

In the end, I figured it was faulty AMD SATA driver which got installed 
maybe in the last couple of years when I installed AMD's chip driver.  
It did not cause a problem for Ryzen 1700 for the last few years, but 
with 3700x, the boot fails due to it.

After the boot failure, even the safe mode fails to boot. Ugh.
I had to re-install windows and so had to re-install many applications 
and such that I use for work and hobby. Oh, such is life.
But I am a happy camper now with the second hand Ryzen 3700x and hope to 
run and find more of these valgrind issues of TB soon. The whole build 
time from scratch got shorted from abot 90+ minutes to 60+ minutes. Not bad.
I have yet to figure out the shortening of TB's test suite execution 
time. I am hampered with strange errors that I did not notice a few 
months before. Maybe these are newly introduced errors, including the 
one I reported, and I am analyzing whether I can simply suppress them or 
investigate in detail.



On 2022/05/11 16:54, Floyd, Paul wrote:

Hi

Can you give us

the source of the small reproducer

the versions of Valgrind, Debian, GCC and glibc?

As you mention, functions like strncmp are often optimized to work on 
multiple bytes at a time and to take advantage of the fact that memory 
will always be allocated in a multiple of say 8 or 16 bytes. And what 
happens sometime is that a function like strncmp will be replaced by 
the compiler with something like __strncmp_avx128 or something like 
that. If Valgrind doesn't recognize this it can't redirect it and do 
error checking on it.


I would expect that the error message contain the name of the Valgrind 
redirect, for instance


==22489==    at 0x4033B7C: __strncmp_sse42 (vg_replace_strmem.c:712)

Si it seems to me that you have a redirection problem. For some reason 
Valgrind is not seeing your strncmp when the client libc gets loaded 
into memory.



A+

Paul




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users





___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-11 Thread Floyd, Paul

Hi

Can you give us

the source of the small reproducer

the versions of Valgrind, Debian, GCC and glibc?

As you mention, functions like strncmp are often optimized to work on 
multiple bytes at a time and to take advantage of the fact that memory 
will always be allocated in a multiple of say 8 or 16 bytes. And what 
happens sometime is that a function like strncmp will be replaced by the 
compiler with something like __strncmp_avx128 or something like that. If 
Valgrind doesn't recognize this it can't redirect it and do error 
checking on it.


I would expect that the error message contain the name of the Valgrind 
redirect, for instance


==22489==    at 0x4033B7C: __strncmp_sse42 (vg_replace_strmem.c:712)

Si it seems to me that you have a redirection problem. For some reason 
Valgrind is not seeing your strncmp when the client libc gets loaded 
into memory.



A+

Paul




___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


[Valgrind-users] Question: bug in user code, valgrind or glibc?

2022-05-10 Thread ISHIKAWA,chiaki

Hi,

I have been analyzing thunderbird mail client under valgrind for sometime.
memcheck has been so useful for me to find memory-related errors.
Thank  you for releasing this great tool.

Recently, I noticed an invalid read of 8 bytes warning, which should be 
familiar to all of us.


Interestingly, the initial part of the stack trace is found in a report 
in Qt bug database.

It comes from dynamic loading library support.
https://bugreports.qt.io/browse/QTBUG-90374
It was filed last year.

My system is Debian GNU/Linux and I used gcc to compile thunderbird.
The report was done by someone who uses clang.

I believe the issue lies in a certain version of dl-library, glibc OR 
valgrind? The reason I say valgrind might be to blame, too, is as follows.
(Debian is known to release toolchains very conservatively. I think that 
is why I did not see this issue last year.)


Actually, mine has line numbers slight off due to version differences I 
suspect.


143:39.43 GECKO(115765) ==115769== Invalid read of size 8
143:39.64 GECKO(115765) ==115769==    at 0x4021BF4: strncmp (strcmp.S:175)
143:39.64 GECKO(115765) ==115769==    by 0x400655D: is_dst (dl-load.c:214)
143:39.64 GECKO(115765) ==115769==    by 0x4007666: _dl_dst_count 
(dl-load.c:251)
143:39.64 GECKO(115765) ==115769==    by 0x4007857: 
expand_dynamic_string_token (dl-load.c:393)
143:39.64 GECKO(115765) ==115769==    by 0x40079C7: fillin_rpath.isra.0 
(dl-load.c:465)
143:39.68 GECKO(115765) ==115769==    by 0x4007CC2: decompose_rpath 
(dl-load.c:636)
143:39.68 GECKO(115765) ==115769==    by 0x4009E9D: cache_rpath 
(dl-load.c:678)
143:39.68 GECKO(115765) ==115769==    by 0x4009E9D: cache_rpath 
(dl-load.c:659)

      ... [omitted] ...

My local valgrind dump tells me where the address was allocated.

143:40.60 GECKO(115765) ==115769==  Address 0x27ba3819 is 9 bytes inside 
a block of size 15 alloc'd
143:40.65 GECKO(115765) ==115769==    at 0x483CF9B: malloc 
(vg_replace_malloc.c:380)
143:40.65 GECKO(115765) ==115769==    by 0x402074B: malloc 
(rtld-malloc.h:56)

143:40.65 GECKO(115765) ==115769==    by 0x402074B: strdup (strdup.c:42)
143:40.65 GECKO(115765) ==115769==    by 0x4007C54: decompose_rpath 
(dl-load.c:611)
143:40.65 GECKO(115765) ==115769==    by 0x4009E9D: cache_rpath 
(dl-load.c:678)
143:40.65 GECKO(115765) ==115769==    by 0x4009E9D: cache_rpath 
(dl-load.c:659)
143:40.65 GECKO(115765) ==115769==    by 0x4009E9D: _dl_map_object 
(dl-load.c:2174)

143:40.65 GECKO(115765) ==115769==    by 0x400E4B0: openaux (dl-deps.c:64)
  ... [omission] ...

I *think* this is a valid error case of large-sized READ used in strncmp 
reading beyond the
allocated memory boundary. (strcmp.S shows 8 octets read instead of one 
octet at a time.)


I think such a usage of strdup/str{n}cmp combination is abound in C 
source codes.

So I thought maybe valgrind was reporting something different.
Otherwise, many application programs have to create suppression for this 
type of issue.

That is what I thought initially.

A different type of error I thought initially was, say, for example, 9 bytes
inside a block of size 15 might mean somehow the data contains
uninitialized data in the string area in that position.  However, come
to think of it, if so, strdup would have triggered a valgrind warning
before this.  There is no warning from valgrind for strdup.

Also, I created a test program and realized that in that case, valgrind 
prints


==120076== Conditional jump or move depends on uninitialised value(s)
==120076==    at 0x4843172: strncmp (vg_replace_strmem.c:663)
==120076==    by 0x108778: main (in /home/ishikawa/Dropbox/TB-DIR/a.out)

So the original problem must be the read beyond malloc'ed area boundary.

Now, is dl-library to blame?
I think dl-library has been used literally hundreds of million times or 
more daily and

is hard to think that there is a bug there. (Famous last word).

Dl-library does not have control how long each path strings are (I
think it is trying to record the path components of a loading path),
and thus cannot control valgrind messages generated due to 8-char read
going beyond the malloced memory end. (So probably people have to
create suppression after all. If the particular version has this
issue.)

As for valgrind, can valgrind be somehow more intelligent in this
case?  Maybe creating a substitute strcmp? (I know single char
comparison at a time would be slower than comparing 8 characters at a
time when appropriate).  But at least, this type of surprise warning
would be reduced.

However, we may have a problem here for glibc..  If this read beyond
the malloced region is for real, we have a problem.  I have no idea how
this behavior is constrained or sanctioned by C standard, C library
standard or POSIX standard, but the use of 8 octets strcmp.S can lead
to a real issue possibly unless malloc() does allocate memory chunks
in 8 or larger unit uniformly. Unless glibc makes sure that there is a 
guard area between malloc area and the