On 2022-11-19, Stefan Bodewig wrote: > while running all tests locally I realized > https://github.com/apache/ant/pull/194 introduces a bug when tars have > an encoding set. You can see this with UntarTest failing. A multibyte > encoding like UTF16 may contain NUL bytes inside of the "normal" part of > the name. I'll have to think about a way to solve this, but we should > not release Ant with this regression.
https://github.com/apache/ant/commit/e04fbe7ffa4f549bdc00bdfce688fb587120eedd fixes tthe problem, at least for our test. It makes parsing tar archives a bit slower, but that likely won't matter much for single-byte encodings (tar isn't meant to be used with multi-byte encodings). Also I don't think reading tars is what a normal build does for the majority of its time. Anyway, I'd appreciate a review of the code I've written there. What I wanted to do is to ask the encoding how it would represent a NUL and look search for that when finding the string terminator - as opposed to looking for a single NUL byte. Our testcase used UTF-16 BE with a BOM, so encoding "\0" ends up with two bytes BOM + 0x00 0x00 - while I really only wanted to 0x00 0x00. Next attempt was to encode an empty string to see whether there is a common prefix, but an empty string is encoded as 0 bytes, no BOM. :-) So finally I compared "X" to "X\0" and stripped what seems to be "X". I'm pretty sure this breaks for certain encodings, but not worse than it has worked before. And then I sprinkled some memoization on top. TBM I could also live with reverting the whole commit, saying "don't use multi-byte encodings in tars" and be done. The original test we had worked by accident, if the old test had used UTF16-LE instead and a 49 character file name it would have failed as we'd try to decode a byte array with an odd number of bytes. Finding A solution was a nice puzzle, but that doesn't mean we have to use it. Stefan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org For additional commands, e-mail: dev-h...@ant.apache.org