Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal
Still happens with: This is perl 5, version 28, subversion 1 (v5.28.1) built for i686-linux-gnu-thread-multi-64int (with 61 registered patches, see perl -V for more detail) # printf "\x9c\x5a" | perl -CI -ne '/[^#]*/' Malformed UTF-8 character: \x9c (unexpected continuation byte 0x9c, with no preceding start byte) in pattern match (m//) at -e line 1, <> line 1. Malformed UTF-8 character (fatal) at -e line 1, <> line 1.
Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal
This still happens with 5.24.1-1. It can be reduced to printf "\x9c\x5a" | perl -CI -ne '/[^#]*/' The byte sequence is indeed invalid utf8 (as shown by iconv as well), but you're explicitly telling Perl (with -CS) that it's getting utf8 on stdin. This is a recipe for problems. So I'm not sure if it's a bug at all. At most the failure should be handled a bit more gracefully. This should be warning, warning programmer could turn off. It's very rare that perl just dies because data it handles is not as expected. Wrong data? Ok, fine, warn, and go on processing. Strange is that /[^#]*/ dies, while /(.*)/ doesn't == in both cases data is the same, data is wrong. Sometimes perl dies, sometimes not.
Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal
Control: found -1 5.24.1-1 On Sun, Jan 29, 2017 at 06:23:30PM +0100, Leszek Dubiel wrote: > Package: perl > Version: 5.20.2-3+deb8u6 > Severity: normal > > This is stripped out program version that causes error: > > printf "\x41\x9c\x5a\x0a" | perl -CS -e '$_ = <>; /^(.*)$/ && print > "($1)\n"; /[^#]*/;' > > It displays: > > (A�Z) > Malformed UTF-8 character (fatal) at -e line 1, <> line 1. > > Locale is pl_PL.UTF-8 . This still happens with 5.24.1-1. It can be reduced to printf "\x9c\x5a" | perl -CI -ne '/[^#]*/' The byte sequence is indeed invalid utf8 (as shown by iconv as well), but you're explicitly telling Perl (with -CS) that it's getting utf8 on stdin. This is a recipe for problems. So I'm not sure if it's a bug at all. At most the failure should be handled a bit more gracefully. -- Niko Tyni nt...@debian.org
Bug#853091: perl: Dying when matching simple regex: Malformed UTF-8 character fatal
Package: perl Version: 5.20.2-3+deb8u6 Severity: normal This is stripped out program version that causes error: printf "\x41\x9c\x5a\x0a" | perl -CS -e '$_ = <>; /^(.*)$/ && print "($1)\n"; /[^#]*/;' It displays: (A�Z) Malformed UTF-8 character (fatal) at -e line 1, <> line 1. Locale is pl_PL.UTF-8 . -- System Information: Debian Release: 8.7 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Kernel: Linux 3.16.0-4-686-pae (SMP w/4 CPU cores) Locale: LANG=pl_PL.UTF-8, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages perl depends on: ii dpkg 1.17.27 ii libbz2-1.01.0.6-7+b3 ii libc6 2.19-18+deb8u7 ii libdb5.3 5.3.28-9 ii libgdbm3 1.8.3-13.1 ii perl-base 5.20.2-3+deb8u6 ii perl-modules 5.20.2-3+deb8u6 ii zlib1g1:1.2.8.dfsg-2+b1 Versions of packages perl recommends: ii netbase 5.3 pn rename Versions of packages perl suggests: pn libterm-readline-gnu-perl | libterm-readline-perl-perl pn make pn perl-doc -- no debconf information