Hi all, I've been working on COMPRESS-183 which is a more general version of COMPRESS-114 we fixed a while ago. It asks for support of non-ASCII file names in tar archives by using an explicit encoding (COMPRESS-114 made things work for ISO-8859-1 and any other encoding that creates the same bytes for chars 0 to 255).
tar itself doesn't support anything but ASCII and only the later POSIX versions added support for UTF-8 via PAX extension headers (something I intend to add). Most tar dialects will use the platform's default encoding for non-ASCII names. I have checked in some initial infrastructure that reuses the zip package's encoding classes and already allows reading of any encoding, adding write support will be trivial. The patch is more convoluted than I had hoped as the tar package has way too many public methods and I had to work around backwards compatibility issues including swallowing exceptions that may occur if the specified encoding doesn't work for the name/bytes. This is something to address in compress 2.x (that I hope to kick off after releasing 1.4). Anyway, the current code changes one thing: it now defaults to using the platform's default encoding, while the 1.3 version specifically supports iso-8859-1 (and nothing else). Anybody who relied on iso-8859-1 being the default will have to change the code to explicitly ask for it. Is this acceptable or do I need to change the default? Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org