Hello list! jlh wrote: > export LC_ALL=en_US.utf8 > $ touch $(echo -en 'file-\0344') > $ tar -vcf my.tar file-* > file-\344 > $ tar -tf my.tar > file-\344 > $ tar -tf my.tar --wildcards '*' > tar: *: Not found in archive > tar: Error exit delayed from previous errors
Ok, here's an update. I could track down the cause of this problem. In order to match file names to patterns, tar uses the fnmatch(3), which is provided by glibc. This happens in lib/exclude.c:149:exclude_fnmatch(). fnmatch() is documented to return 0 on a successful match, FNM_NOMATCH (defined to be 1) on a not-match, and anything else on error. exclude_fnmatch() only compares the return value to 0 and thus treats a non-match and an error the same way. The particular problem I'm experiencing triggered an error and fnmatch() indeed returns -1, which means an error happened and perror() says "Invalid or incomplete multibyte or wide character". The message is correct, since the byte is invalid in utf8, but I was under the impression that a path component may consist of any sequence of non-nul, non-slash bytes. Since fnmatch() is specially aimed at matching paths I would think it should also handle the cases where a path component contains arbitrary bytes. I've been able to reproduce this error as a stand-alone small test-case that calls fnmatch(), so this is not a tar problem anymore (excepted that tar doesn't check for errors). I will take it to the glibc list. One other comment: I also noticed that tar makes the call to fnmatch with the flag value 0x50000008 in this particular case. The low bit corresponds to the flag FNM_LEADING_DIR, but the two high bits have no meaning to fnmatch() as far as I can see, they're only used by tar itself for internal use. Does it say somewhere that one may set undefined bits in flags and expect things to still work? It seems to work here, but I thought I'd comment on this. Thanks, jlh
