On 12/19/2017 03:31 PM, Bernhard Voelker wrote:
The test case in your attachment is a bit different, but also shows the problem. It seems that gnulib's regex does not find a match for the pattern '.*\.exe$' for the files in the following directory: $ LC_ALL=C /usr/bin/ls -log htdocs ... drwxr-xr-x 2 4096 Dec 18 20:45 'Zielona G'$'\363''ra' ... I'm not an expert on UTF and regex, but it seems that the $'\363' character is not matched by the dot '.' meta character in your locale.
POSIX says that regex only has to match characters (in particular, the glob '.' matches characters, not encoding errors). If you pick a locale with multibyte characters that are subject to encoding errors when processing random bytes (as is the case when using a UTF-8 locale to process single-byte ISO filenames), then POSIX says regex behavior is undefined. So while it is indeed annoying that find can't match files with encoding errors, it is somewhat expected behavior, because there's no sane way to make regex well-specified on encoding errors.
-- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org