Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal

2019-08-25 Thread Leszek Dubiel



Still happens with:

This is perl 5, version 28, subversion 1 (v5.28.1) built for 
i686-linux-gnu-thread-multi-64int

(with 61 registered patches, see perl -V for more detail)



# printf "\x9c\x5a" | perl -CI -ne '/[^#]*/'

Malformed UTF-8 character: \x9c (unexpected continuation byte 0x9c, with 
no preceding start byte) in pattern match (m//) at -e line 1, <> line 1.

Malformed UTF-8 character (fatal) at -e line 1, <> line 1.



Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal

2017-01-29 Thread Leszek Dubiel



This still happens with 5.24.1-1. It can be reduced to

  printf "\x9c\x5a" | perl -CI -ne '/[^#]*/'

The byte sequence is indeed invalid utf8 (as shown by iconv as well),
but you're explicitly telling Perl (with -CS) that it's getting utf8 on
stdin. This is a recipe for problems.

So I'm not sure if it's a bug at all. At most the failure should be
handled a bit more gracefully.


This should be warning, warning programmer could turn off. It's very 
rare that perl just dies because data it handles is not as expected.

Wrong data? Ok, fine, warn, and go on processing.

Strange is that /[^#]*/ dies, while /(.*)/ doesn't == in both cases data 
is the same, data is wrong. Sometimes perl dies, sometimes not.




Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal

2017-01-29 Thread Niko Tyni
Control: found -1 5.24.1-1

On Sun, Jan 29, 2017 at 06:23:30PM +0100, Leszek Dubiel wrote:
> Package: perl
> Version: 5.20.2-3+deb8u6
> Severity: normal
> 
> This is stripped out program version that causes error: 
> 
>   printf "\x41\x9c\x5a\x0a" | perl -CS -e '$_ = <>; /^(.*)$/ && print 
> "($1)\n"; /[^#]*/;'
> 
> It displays: 
> 
>   (A�Z)
>   Malformed UTF-8 character (fatal) at -e line 1, <> line 1.
> 
> Locale is pl_PL.UTF-8 . 

This still happens with 5.24.1-1. It can be reduced to

 printf "\x9c\x5a" | perl -CI -ne '/[^#]*/'

The byte sequence is indeed invalid utf8 (as shown by iconv as well),
but you're explicitly telling Perl (with -CS) that it's getting utf8 on
stdin. This is a recipe for problems.

So I'm not sure if it's a bug at all. At most the failure should be
handled a bit more gracefully.
-- 
Niko Tyni   nt...@debian.org



Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal

2017-01-29 Thread Leszek Dubiel
Package: perl
Version: 5.20.2-3+deb8u6
Severity: normal

This is stripped out program version that causes error: 

printf "\x41\x9c\x5a\x0a" | perl -CS -e '$_ = <>; /^(.*)$/ && print 
"($1)\n"; /[^#]*/;'

It displays: 

(A�Z)
Malformed UTF-8 character (fatal) at -e line 1, <> line 1.

Locale is pl_PL.UTF-8 . 


-- System Information:
Debian Release: 8.7
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: i386 (i686)

Kernel: Linux 3.16.0-4-686-pae (SMP w/4 CPU cores)
Locale: LANG=pl_PL.UTF-8, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages perl depends on:
ii  dpkg  1.17.27
ii  libbz2-1.01.0.6-7+b3
ii  libc6 2.19-18+deb8u7
ii  libdb5.3  5.3.28-9
ii  libgdbm3  1.8.3-13.1
ii  perl-base 5.20.2-3+deb8u6
ii  perl-modules  5.20.2-3+deb8u6
ii  zlib1g1:1.2.8.dfsg-2+b1

Versions of packages perl recommends:
ii  netbase  5.3
pn  rename   

Versions of packages perl suggests:
pn  libterm-readline-gnu-perl | libterm-readline-perl-perl  
pn  make
pn  perl-doc

-- no debconf information