Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.
Control: forwarded -1 https://rt.perl.org/Ticket/Display.html?id=131575 On Wed, Jun 14, 2017 at 09:28:44PM +0300, Niko Tyni wrote: > On Wed, Jun 14, 2017 at 07:16:35PM +0200, Benjamin Bayart wrote: > > Package: perl > > Version: 5.24.1-3 > > Severity: normal > > Tags: upstream > > > In some cases, some valid utf-8 chinese (or japanese Kanji) chars > > in a perl string makes perl die on "Malformed UTF-8" while matching > > a regexp. > I'll try to bisect this and forward upstream. This seems to have regressed in 5.23.4 with https://perl5.git.perl.org/perl.git/commit/147f21b5b8054c559a1ffb568dbf310244fa0c91 and I've forwarded the issue upstream as https://rt.perl.org/Ticket/Display.html?id=131575 -- Niko Tyni nt...@debian.org
Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.
Control: tag -1 confirmed On Wed, Jun 14, 2017 at 07:16:35PM +0200, Benjamin Bayart wrote: > Package: perl > Version: 5.24.1-3 > Severity: normal > Tags: upstream > In some cases, some valid utf-8 chinese (or japanese Kanji) chars > in a perl string makes perl die on "Malformed UTF-8" while matching > a regexp. > > Here is the smallest programm (all in ascii, for safety) creating > the problem. Thanks for the report and the test case. Running this with debugperl under valgrind shows invalid memory accesses, log below. It also happens with 5.26.0, but indeed not with the jessie 5.20 perl. I got it down to a somewhat simpler form #!/usr/bin/perl use strict; use warnings; my $text = "%t%\x{6bce}"; $text =~ s{~*%[a-z]%}{}g; print "Works, for now\n"; which still crashes here and shows similar valgrind errors. I'll try to bisect this and forward upstream. ==15091== Memcheck, a memory error detector ==15091== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==15091== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info ==15091== Command: debugperl 864782.pl ==15091== ==15091== Invalid read of size 1 ==15091==at 0x4C30027: memchr (vg_replace_strmem.c:883) ==15091==by 0x20795B: Perl_fbm_instr (util.c:828) ==15091==by 0x311B9C: Perl_re_intuit_start (regexec.c:907) ==15091==by 0x314DFF: Perl_regexec_flags (regexec.c:2982) ==15091==by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225) ==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239) ==15091==by 0x16D962: S_run_body (perl.c:2488) ==15091==by 0x16D962: perl_run (perl.c:2411) ==15091==by 0x136408: main (perlmain.c:116) ==15091== Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd ==15091==at 0x4C2BBAF: malloc (vg_replace_malloc.c:299) ==15091==by 0x208FB2: Perl_safesysmalloc (util.c:153) ==15091==by 0x260557: Perl_sv_grow (sv.c:1605) ==15091==by 0x26EB55: Perl_sv_setpvn (sv.c:4896) ==15091==by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233) ==15091==by 0x234811: Perl_pp_stringify (pp_hot.c:89) ==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239) ==15091==by 0x142850: S_fold_constants (op.c:4381) ==15091==by 0x1B47A3: Perl_yyparse (perly.y:711) ==15091==by 0x16BA2A: S_parse_body (perl.c:2336) ==15091==by 0x16BA2A: perl_parse (perl.c:1650) ==15091==by 0x136362: main (perlmain.c:114) ==15091== ==15091== Invalid read of size 1 ==15091==at 0x2FB0D1: S_reginclass (regexec.c:9038) ==15091==by 0x30BB9C: S_find_byclass (regexec.c:1869) ==15091==by 0x312806: Perl_re_intuit_start (regexec.c:1293) ==15091==by 0x314DFF: Perl_regexec_flags (regexec.c:2982) ==15091==by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225) ==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239) ==15091==by 0x16D962: S_run_body (perl.c:2488) ==15091==by 0x16D962: perl_run (perl.c:2411) ==15091==by 0x136408: main (perlmain.c:116) ==15091== Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd ==15091==at 0x4C2BBAF: malloc (vg_replace_malloc.c:299) ==15091==by 0x208FB2: Perl_safesysmalloc (util.c:153) ==15091==by 0x260557: Perl_sv_grow (sv.c:1605) ==15091==by 0x26EB55: Perl_sv_setpvn (sv.c:4896) ==15091==by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233) ==15091==by 0x234811: Perl_pp_stringify (pp_hot.c:89) ==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239) ==15091==by 0x142850: S_fold_constants (op.c:4381) ==15091==by 0x1B47A3: Perl_yyparse (perly.y:711) ==15091==by 0x16BA2A: S_parse_body (perl.c:2336) ==15091==by 0x16BA2A: perl_parse (perl.c:1650) ==15091==by 0x136362: main (perlmain.c:114) ==15091== ==15091== Invalid read of size 1 ==15091==at 0x30BB67: S_find_byclass (regexec.c:1869) ==15091==by 0x312806: Perl_re_intuit_start (regexec.c:1293) ==15091==by 0x314DFF: Perl_regexec_flags (regexec.c:2982) ==15091==by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225) ==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239) ==15091==by 0x16D962: S_run_body (perl.c:2488) ==15091==by 0x16D962: perl_run (perl.c:2411) ==15091==by 0x136408: main (perlmain.c:116) ==15091== Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd ==15091==at 0x4C2BBAF: malloc (vg_replace_malloc.c:299) ==15091==by 0x208FB2: Perl_safesysmalloc (util.c:153) ==15091==by 0x260557: Perl_sv_grow (sv.c:1605) ==15091==by 0x26EB55: Perl_sv_setpvn (sv.c:4896) ==15091==by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233) ==15091==by 0x234811: Perl_pp_stringify (pp_hot.c:89) ==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239) ==15091==by 0x142850: S_fold_constants (op.c:4381) ==15091==by 0x1B47A3: Perl_yyparse (perly.y:711) ==15091==by 0x16BA2A: S_parse_body (perl.c:2336) ==15091==by 0x16BA2A: perl_parse (perl.c:1650) ==15091==by 0x136362: main (perlmain.c:114) ==15091== ==15091== Invalid read of size 1
Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.
On Wed, 14 Jun 2017 19:16:35 +0200, Benjamin Bayart wrote: > In some cases, some valid utf-8 chinese (or japanese Kanji) chars > in a perl string makes perl die on "Malformed UTF-8" while matching > a regexp. > > Here is the smallest programm (all in ascii, for safety) creating > the problem. Now that's interesting. I ran the script in a loop on my laptop (amd64, Debian unstable), and it didn't error out a single time in over 100_000 runs. OTOH, on one of my raspis (armhf-ish, Raspbian stretch), it didn't even succeed a single time in a couple of tries, and always fails with Failed Malformed UTF-8 character (fatal) at crash.pl line 8. And on a third machine, a remote server (amd64, Debian stretch), I got the first pass only after over 400 failures. All with perl 5.24.1-3. So whatever is going on here seems a bit undeterministic … Cheers, gregor -- .''`. https://info.comodo.priv.at/ - Debian Developer https://www.debian.org : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06 `. `' Member of VIBE!AT & SPI, fellow of the Free Software Foundation Europe `- NP: Various Artists: Black Velvet Band signature.asc Description: Digital Signature
Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.
Package: perl Version: 5.24.1-3 Severity: normal Tags: upstream Dear Maintainer, In some cases, some valid utf-8 chinese (or japanese Kanji) chars in a perl string makes perl die on "Malformed UTF-8" while matching a regexp. Here is the smallest programm (all in ascii, for safety) creating the problem. #!/usr/bin/perl use strict; use warnings; my $text = "[quant,_1,\x{55b6}\x{696d}\x{65e5},\x{55b6}\x{696d}\x{65e5}]\x{6bce}"; eval {$text =~ s{((?