Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.

2017-06-14 Thread Niko Tyni
Control: forwarded -1 https://rt.perl.org/Ticket/Display.html?id=131575

On Wed, Jun 14, 2017 at 09:28:44PM +0300, Niko Tyni wrote:
> On Wed, Jun 14, 2017 at 07:16:35PM +0200, Benjamin Bayart wrote:
> > Package: perl
> > Version: 5.24.1-3
> > Severity: normal
> > Tags: upstream
> 
> > In some cases, some valid utf-8 chinese (or japanese Kanji) chars
> > in a perl string makes perl die on "Malformed UTF-8" while matching
> > a regexp.

> I'll try to bisect this and forward upstream.

This seems to have regressed in 5.23.4 with

 
https://perl5.git.perl.org/perl.git/commit/147f21b5b8054c559a1ffb568dbf310244fa0c91

and I've forwarded the issue upstream as

 https://rt.perl.org/Ticket/Display.html?id=131575

-- 
Niko Tyni   nt...@debian.org



Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.

2017-06-14 Thread Niko Tyni
Control: tag -1 confirmed

On Wed, Jun 14, 2017 at 07:16:35PM +0200, Benjamin Bayart wrote:
> Package: perl
> Version: 5.24.1-3
> Severity: normal
> Tags: upstream

> In some cases, some valid utf-8 chinese (or japanese Kanji) chars
> in a perl string makes perl die on "Malformed UTF-8" while matching
> a regexp.
> 
> Here is the smallest programm (all in ascii, for safety) creating
> the problem.
 
Thanks for the report and the test case.

Running this with debugperl under valgrind shows invalid memory
accesses, log below.

It also happens with 5.26.0, but indeed not with the jessie 5.20
perl.

I got it down to a somewhat simpler form


  #!/usr/bin/perl
  
  use strict;
  use warnings;
  
  my $text = "%t%\x{6bce}";
  
  $text =~ s{~*%[a-z]%}{}g;
  print "Works, for now\n";


which still crashes here and shows similar valgrind errors.

I'll try to bisect this and forward upstream.


==15091== Memcheck, a memory error detector
==15091== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15091== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==15091== Command: debugperl 864782.pl
==15091== 
==15091== Invalid read of size 1
==15091==at 0x4C30027: memchr (vg_replace_strmem.c:883)
==15091==by 0x20795B: Perl_fbm_instr (util.c:828)
==15091==by 0x311B9C: Perl_re_intuit_start (regexec.c:907)
==15091==by 0x314DFF: Perl_regexec_flags (regexec.c:2982)
==15091==by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225)
==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==by 0x16D962: S_run_body (perl.c:2488)
==15091==by 0x16D962: perl_run (perl.c:2411)
==15091==by 0x136408: main (perlmain.c:116)
==15091==  Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd
==15091==at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==15091==by 0x208FB2: Perl_safesysmalloc (util.c:153)
==15091==by 0x260557: Perl_sv_grow (sv.c:1605)
==15091==by 0x26EB55: Perl_sv_setpvn (sv.c:4896)
==15091==by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233)
==15091==by 0x234811: Perl_pp_stringify (pp_hot.c:89)
==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==by 0x142850: S_fold_constants (op.c:4381)
==15091==by 0x1B47A3: Perl_yyparse (perly.y:711)
==15091==by 0x16BA2A: S_parse_body (perl.c:2336)
==15091==by 0x16BA2A: perl_parse (perl.c:1650)
==15091==by 0x136362: main (perlmain.c:114)
==15091== 
==15091== Invalid read of size 1
==15091==at 0x2FB0D1: S_reginclass (regexec.c:9038)
==15091==by 0x30BB9C: S_find_byclass (regexec.c:1869)
==15091==by 0x312806: Perl_re_intuit_start (regexec.c:1293)
==15091==by 0x314DFF: Perl_regexec_flags (regexec.c:2982)
==15091==by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225)
==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==by 0x16D962: S_run_body (perl.c:2488)
==15091==by 0x16D962: perl_run (perl.c:2411)
==15091==by 0x136408: main (perlmain.c:116)
==15091==  Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd
==15091==at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==15091==by 0x208FB2: Perl_safesysmalloc (util.c:153)
==15091==by 0x260557: Perl_sv_grow (sv.c:1605)
==15091==by 0x26EB55: Perl_sv_setpvn (sv.c:4896)
==15091==by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233)
==15091==by 0x234811: Perl_pp_stringify (pp_hot.c:89)
==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==by 0x142850: S_fold_constants (op.c:4381)
==15091==by 0x1B47A3: Perl_yyparse (perly.y:711)
==15091==by 0x16BA2A: S_parse_body (perl.c:2336)
==15091==by 0x16BA2A: perl_parse (perl.c:1650)
==15091==by 0x136362: main (perlmain.c:114)
==15091== 
==15091== Invalid read of size 1
==15091==at 0x30BB67: S_find_byclass (regexec.c:1869)
==15091==by 0x312806: Perl_re_intuit_start (regexec.c:1293)
==15091==by 0x314DFF: Perl_regexec_flags (regexec.c:2982)
==15091==by 0x2BA4D0: Perl_pp_substcont (pp_ctl.c:225)
==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==by 0x16D962: S_run_body (perl.c:2488)
==15091==by 0x16D962: perl_run (perl.c:2411)
==15091==by 0x136408: main (perlmain.c:116)
==15091==  Address 0x5c5f48b is 0 bytes after a block of size 59 alloc'd
==15091==at 0x4C2BBAF: malloc (vg_replace_malloc.c:299)
==15091==by 0x208FB2: Perl_safesysmalloc (util.c:153)
==15091==by 0x260557: Perl_sv_grow (sv.c:1605)
==15091==by 0x26EB55: Perl_sv_setpvn (sv.c:4896)
==15091==by 0x26F0B8: Perl_sv_copypv_flags (sv.c:3233)
==15091==by 0x234811: Perl_pp_stringify (pp_hot.c:89)
==15091==by 0x206AD9: Perl_runops_debug (dump.c:2239)
==15091==by 0x142850: S_fold_constants (op.c:4381)
==15091==by 0x1B47A3: Perl_yyparse (perly.y:711)
==15091==by 0x16BA2A: S_parse_body (perl.c:2336)
==15091==by 0x16BA2A: perl_parse (perl.c:1650)
==15091==by 0x136362: main (perlmain.c:114)
==15091== 
==15091== Invalid read of size 1

Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.

2017-06-14 Thread gregor herrmann
On Wed, 14 Jun 2017 19:16:35 +0200, Benjamin Bayart wrote:

> In some cases, some valid utf-8 chinese (or japanese Kanji) chars
> in a perl string makes perl die on "Malformed UTF-8" while matching
> a regexp.
> 
> Here is the smallest programm (all in ascii, for safety) creating
> the problem.

Now that's interesting. I ran the script in a loop on my laptop
(amd64, Debian unstable), and it didn't error out a single time in
over 100_000 runs.

OTOH, on one of my raspis (armhf-ish, Raspbian stretch), it didn't
even succeed a single time in a couple of tries, and always fails
with

Failed Malformed UTF-8 character (fatal) at crash.pl line 8.

And on a third machine, a remote server (amd64, Debian stretch), I
got the first pass only after over 400 failures.

All with perl 5.24.1-3. 

So whatever is going on here seems a bit undeterministic …

Cheers,
gregor

-- 
 .''`.  https://info.comodo.priv.at/ - Debian Developer https://www.debian.org
 : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D  85FA BB3A 6801 8649 AA06
 `. `'  Member of VIBE!AT & SPI, fellow of the Free Software Foundation Europe
   `-   NP: Various Artists: Black Velvet Band


signature.asc
Description: Digital Signature


Bug#864782: perl: Regexp matching crashes claiming string is malformed Utf8, despite it is valid.

2017-06-14 Thread Benjamin Bayart
Package: perl
Version: 5.24.1-3
Severity: normal
Tags: upstream

Dear Maintainer,


In some cases, some valid utf-8 chinese (or japanese Kanji) chars
in a perl string makes perl die on "Malformed UTF-8" while matching
a regexp.

Here is the smallest programm (all in ascii, for safety) creating
the problem.


#!/usr/bin/perl

use strict;
use warnings;

my $text = 
"[quant,_1,\x{55b6}\x{696d}\x{65e5},\x{55b6}\x{696d}\x{65e5}]\x{6bce}";

eval {$text =~ 
s{((?