[Bug sanitizer/99814] regexec fails with -fsanitize=address

2021-03-30 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #7 from Jakub Jelinek  ---
(In reply to Alex Richardson from comment #5)
> Does the sanitizer runtime library include the
> https://reviews.llvm.org/D96348 patch?
> 
> IMO the real issue is that dlsym() with RTLD_NEXT selects the oldest
> versioned symbol. Not sure why that behaviour was chosen.
> I'm sure there are lots of other sanitizer interceptors that are also
> affected by https://sourceware.org/bugzilla/show_bug.cgi?id=1319.

dlsym behavior matches the behavior of normal symbol lookup resolution.
When glibc (or some other libraries) started, it was unversioned and later
symbol versions were added to it.  When linking against the very old glibc,
libraries or binaries would use unversioned symbols and so that for ABI
compatibility naturally needs to be resolved against the oldest symbol version.
 Libraries/binaries linked against newer glibc versions then have versioned
symbols and use both the symbol name and symbol version in symbol lookup (i.e.
as dlvsym).
For dlsym, one doesn't really know in which era the library or binary has been
linked against and what it expects, it could be very old binary or newer or
most recent, and if the same symbol has multiple symbol versions, which one to
choose is unknown.  So, for symbols with more than one symbol version one
should use dlvsym instead of dlsym.
Ideally, libsanitizer shared libraries would be symbol versioned, for its own
APIs with some sanitizer specific symbol version(s), for the symbols it
intercepts from glibc with the symbol versions from glibc it was configured
against, and for symbols with multiple symbol versions one should have multiple
interceptors, which if they call the intercepted function should use dlvsym.
That would mean at library configure time scaning glibc symbol versions and
deciding on the *san version scripts and predefined macros based on that.

[Bug sanitizer/99814] regexec fails with -fsanitize=address

2021-03-30 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #6 from Martin Liška  ---
(In reply to Alex Richardson from comment #5)
> Does the sanitizer runtime library include the
> https://reviews.llvm.org/D96348 patch?

Yes, the change was merged into GCC master some time ago.

> 
> IMO the real issue is that dlsym() with RTLD_NEXT selects the oldest
> versioned symbol. Not sure why that behaviour was chosen.
> I'm sure there are lots of other sanitizer interceptors that are also
> affected by https://sourceware.org/bugzilla/show_bug.cgi?id=1319.

Shouldn't dlvsym return the only one symbol in this case? Can't we rely on
that?

[Bug sanitizer/99814] regexec fails with -fsanitize=address

2021-03-30 Thread Alexander.Richardson at cl dot cam.ac.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

Alex Richardson  changed:

   What|Removed |Added

 CC||Alexander.Richardson at cl dot 
cam
   ||.ac.uk

--- Comment #5 from Alex Richardson  
---
Does the sanitizer runtime library include the https://reviews.llvm.org/D96348
patch?

IMO the real issue is that dlsym() with RTLD_NEXT selects the oldest versioned
symbol. Not sure why that behaviour was chosen.
I'm sure there are lots of other sanitizer interceptors that are also affected
by https://sourceware.org/bugzilla/show_bug.cgi?id=1319.

[Bug sanitizer/99814] regexec fails with -fsanitize=address

2021-03-30 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #4 from Stefan Schulze Frielinghaus  
---
Thanks for the pointers!  I reported it upstream in issue
[1390](https://github.com/google/sanitizers/issues/1390)

[Bug sanitizer/99814] regexec fails with -fsanitize=address

2021-03-30 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #3 from Martin Liška  ---
Strange, please report it to upstream:
https://github.com/google/sanitizers/issues

and CC people from https://reviews.llvm.org/D96348

[Bug sanitizer/99814] regexec fails with -fsanitize=address

2021-03-30 Thread stefansf at linux dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

--- Comment #2 from Stefan Schulze Frielinghaus  
---
Breakpoint 4, __interception::InterceptFunction (name=0x3fffd61e8f2 "regexec",
ver=0x3fffd61eb7e "GLIBC_2.3.4", ptr_to_real=0x3fffd677d08
<__interception::real_regexec>, func=16779728, 
wrapper=4398001883504) at
/devel/gcc-4/src/libsanitizer/interception/interception_linux.cpp:74
74void *addr = GetFuncAddr(name, ver);

At the end of InterceptFunction we have:

(gdb) print addr
$1 = (void *) 0x3fffd2e9110 <__GI___regexec>

The address itself also LGTM, i.e., `readelf -s /lib64/libc.so.6 | grep
regexec` results in:
   279: 000e9110   344 FUNCGLOBAL DEFAULT   13 regexec@@GLIBC_2.3.4
...
 25156: 000e9110   344 FUNCLOCAL  DEFAULT   13 __GI___regexec

However, variables func and wrapper differ 

(gdb) print func
$2 = 16779728
(gdb) print wrapper
$3 = 4398001883504

so we return false.

[Bug sanitizer/99814] regexec fails with -fsanitize=address

2021-03-30 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99814

Martin Liška  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2021-03-30

--- Comment #1 from Martin Liška  ---
Thanks for the report. Hm, it's strange as we should request exactly this
version of the symbol through the following code path:

  COMMON_INTERCEPT_FUNCTION_GLIBC_VER_MIN(regexec, "GLIBC_2.3.4");
\

#ifdef __GLIBC__
// If we could not find the versioned symbol, fall back to an unversioned
// lookup. This is needed to work around a GLibc bug that causes dlsym
// with RTLD_NEXT to return the oldest versioned symbol.
// See https://sourceware.org/bugzilla/show_bug.cgi?id=14932.
// For certain symbols (e.g. regexec) we have to perform a versioned lookup,
// but that versioned symbol will only exist for architectures where the
// oldest Glibc version pre-dates support for that architecture.
// For example, regexec@GLIBC_2.3.4 exists on x86_64, but not RISC-V.
// See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98920.
#define COMMON_INTERCEPT_FUNCTION_GLIBC_VER_MIN(fn, ver) \
  COMMON_INTERCEPT_FUNCTION_VER_UNVERSIONED_FALLBACK(fn, ver)
#else
#define COMMON_INTERCEPT_FUNCTION_GLIBC_VER_MIN(fn, ver) \
  COMMON_INTERCEPT_FUNCTION(fn)
#endif

#define ASAN_INTERCEPT_FUNC_VER_UNVERSIONED_FALLBACK(name, ver)  \
  do {   \
if (!INTERCEPT_FUNCTION_VER(name, ver) && !INTERCEPT_FUNCTION(name)) \
  VReport(1, "AddressSanitizer: failed to intercept '%s@@%s' or '%s'\n", \
  #name, #ver, #name);   \
  } while (0)


Can you please debug if INTERCEPT_FUNCTION_VER really fails?
I'm sorry but I don't have a handy s390 machine.