https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118113
Bug ID: 118113
Summary: std::regex construction from string literal causes
out-of-bounds access when compiled with O2 and LTO
Product: gcc
Version: 14.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: david.cortes.rivera at gmail dot com
Target Milestone: ---
Forwarding from example by Ivan Krylov here:
https://stat.ethz.ch/pipermail/r-package-devel/2024q4/011309.html
Creating a regex from a string literal will cause an out-of-bounds access when
both -O2 and -flto are used together. It is reproducible with GCC versions 14.2
and 12.2 as far as I can tell.
Code to reproduce (courtesy of Ivan Krylov):
#include <iostream>
#include <regex>
int main() {
std::string s{" gjdshlkhj \" lsjkhkljh "};
const char * rx = "\"";
std::cout
<< std::regex_replace(s, std::regex(rx), "\\\"") // <-- line 7
<< std::endl;
// the code below is required for the problem to happen above!
for (int i = 0; i < 100; ++i) volatile std::regex rxx(rx);
}
If compiled as follows:
g++ -fsanitize=address -O2 -flto=auto bugged_regex.cpp
Then running it will result into the following error message:
=================================================================
==33379==ERROR: AddressSanitizer: global-buffer-overflow on address
0x563bfc3d1482 at pc 0x563bfc37cf37 bp 0x7ffd574b4a70 sp 0x7ffd574b4a68
READ of size 1 at 0x563bfc3d1482 thread T0
#0 0x563bfc37cf36 in std::__detail::_Scanner<char>::_M_advance()
(/home/david/c_quicktest/a.out+0x1df36)
#1 0x563bfc384aee in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_try_char()
(/home/david/c_quicktest/a.out+0x25aee)
#2 0x563bfc39838a in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()
(/home/david/c_quicktest/a.out+0x3938a)
#3 0x563bfc398d40 in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()
(/home/david/c_quicktest/a.out+0x39d40)
#4 0x563bfc3a5a6d in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()
(/home/david/c_quicktest/a.out+0x46a6d)
#5 0x563bfc3b0c75 in
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char
const*, char const*, std::locale const&,
std::regex_constants::syntax_option_type) [clone .constprop.0]
(/home/david/c_quicktest/a.out+0x51c75)
#6 0x563bfc3b1f78 in std::__cxx11::basic_regex<char,
std::__cxx11::regex_traits<char> >::_M_compile(char const*, char const*,
std::regex_constants::syntax_option_type) [clone .constprop.0]
(/home/david/c_quicktest/a.out+0x52f78)
#7 0x563bfc36c9e4 in main (/home/david/c_quicktest/a.out+0xd9e4)
#8 0x7f9978d85249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
#9 0x7f9978d85304 in __libc_start_main_impl ../csu/libc-start.c:360
#10 0x563bfc3723eb (/home/david/c_quicktest/a.out+0x133eb)
0x563bfc3d1482 is located 62 bytes before global variable '*.LC41' defined in
'./a.ltrans1.ltrans' (0x563bfc3d14c0) of size 145
'*.LC41' is ascii string 'Number of NFA states exceeds limit. Please use
shorter regex string, or use smaller brace expression, or make
_GLIBCXX_REGEX_STATE_LIMIT larger.'
0x563bfc3d1482 is located 0 bytes after global variable '*.LC40' defined in
'./a.ltrans1.ltrans' (0x563bfc3d1480) of size 2
'*.LC40' is ascii string '"'
SUMMARY: AddressSanitizer: global-buffer-overflow
(/home/david/c_quicktest/a.out+0x1df36) in
std::__detail::_Scanner<char>::_M_advance()
Shadow bytes around the buggy address:
0x563bfc3d1200: 01 f9 f9 f9 f9 f9 f9 f9 00 00 00 02 f9 f9 f9 f9
0x563bfc3d1280: 00 00 00 00 00 00 f9 f9 f9 f9 f9 f9 00 00 00 00
0x563bfc3d1300: 05 f9 f9 f9 f9 f9 f9 f9 00 07 f9 f9 f9 f9 f9 f9
0x563bfc3d1380: 07 f9 f9 f9 f9 f9 f9 f9 00 05 f9 f9 f9 f9 f9 f9
0x563bfc3d1400: 00 f9 f9 f9 f9 f9 f9 f9 00 06 f9 f9 f9 f9 f9 f9
=>0x563bfc3d1480:[02]f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00
0x563bfc3d1500: 00 00 00 00 00 00 00 00 00 00 01 f9 f9 f9 f9 f9
0x563bfc3d1580: 00 00 00 00 00 00 01 f9 f9 f9 f9 f9 00 00 00 00
0x563bfc3d1600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01
0x563bfc3d1680: f9 f9 f9 f9 00 00 00 03 f9 f9 f9 f9 00 00 00 01
0x563bfc3d1700: f9 f9 f9 f9 00 00 00 00 00 00 05 f9 f9 f9 f9 f9
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==33379==ABORTING
Removing either the -O2 or the -flto makes the problem go away.
Note that the ASAN error was originally shown for code that didn't have
volatile, in this line of code here:
https://github.com/david-cortes/isotree/blob/1f84128a03bb6fc5eecd1de7aebf4b745b54fa1e/src/formatted_exporters.cpp#L332