On Oct 9, 2012, at 10:30 PM, Pavel Raiskup wrote: > On Tue, 2012-10-09 at 21:31 -0700, Tim Kientzle wrote: >> On Oct 9, 2012, at 12:57 AM, Pavel Raiskup wrote: >> >>> On Mon, 2012-10-08 at 15:24 -0700, Paul Eggert wrote: >>>> On 10/08/2012 08:52 AM, Pavel Raiskup wrote: >>>>> we are not able to store/restore extended attributes containing '=' >>>>> character in the keyword properly - same situation in star as our patch >>>>> uses the same approach. >>>> >>>> Can't we fix this by URL-encoding '=' and '%' when creating the tar file >>>> and URL-decoding when extracting? It sounds easy, but it's not done >>>> by that patch, so what am I missing? >>> >>> Yes, that seems to be OK solution. I'll prepare patch today and I'll post >>> it in the xattr-proposal thread. >> >> bsdtar also URL-encodes any characters that >> aren't printable ASCII. >> >> This avoids any problems if a reader expects >> the key name to be valid UTF-8. > > Tim, what problems you mean? I expected that tar implementations are > searching for '0x3d' byte as a splitting point between KEY and VALUE. > And all '=' characters in KEY are URL-encoded -> no ambiguity. > > As proposed simplified encoding transforms only '%' and '=' characters, > GNU tar may never transform the string from UTF-8 valid to non-UTF-8 > string?
xattr names in the filesystem can have any bytes (except NULL) in any order. POSIX says that key names are UTF-8. So it would be reasonable for a program reading pax format archives to reject entries where the key is not valid UTF-8. GNU tar may not do this, but my concern is that other people may implement tar readers in other ways. In any case, this is how LIBARCHIVE.xattr works. I think you are implementing SCHILY.xattr, so it really matters what star does. I think it does not do any URL-encoding at all, but you should ask Joerg. Tim
