Hello list!

Using GNU tar 1.19, I was unable to extract files from a tar
archive using wildcards when the file name contains a byte
sequence that is not valid in the current locale.  Specifically,
this didn't work as I expected:

$ touch $(echo -en 'file-\0344')

$ tar -vcf my.tar file-*
file-\344

$ tar -tf my.tar
file-\344

$ tar -tf my.tar --wildcards '*'
tar: *: Not found in archive
tar: Error exit delayed from previous errors

$ LC_CTYPE=C tar -tf my.tar --wildcards '*'
file-\344

The byte \344 is the character 'รค' ('a' with umlauts) in latin1.
The files in my archive came from a system with a different
encoding.  But regardless of the file names in the archive not
matching the system's encoding, I would still expect the pattern
'*' to match all files.

This can be very annoying when the pattern doesn't fail entirely,
for example, the following command may silently fail to extract
some of the files:

$ tar -xf my.tar --wildcards --wildcards-match-slash 'sub/dir/*'

As a work-around, setting LC_CTYPE=C seems to make it work.

Thanks,
jlh


Reply via email to