On 2022-11-19, Stefan Bodewig wrote:

> while running all tests locally I realized
> https://github.com/apache/ant/pull/194 introduces a bug when tars have
> an encoding set. You can see this with UntarTest failing. A multibyte
> encoding like UTF16 may contain NUL bytes inside of the "normal" part of
> the name. I'll have to think about a way to solve this, but we should
> not release Ant with this regression.

https://github.com/apache/ant/commit/e04fbe7ffa4f549bdc00bdfce688fb587120eedd
fixes tthe problem, at least for our test.

It makes parsing tar archives a bit slower, but that likely won't matter
much for single-byte encodings (tar isn't meant to be used with
multi-byte encodings). Also I don't think reading tars is what a normal
build does for the majority of its time.

Anyway, I'd appreciate a review of the code I've written there.

What I wanted to do is to ask the encoding how it would represent a NUL
and look search for that when finding the string terminator - as opposed
to looking for a single NUL byte.

Our testcase used UTF-16 BE with a BOM, so encoding "\0" ends up with
two bytes BOM + 0x00 0x00 - while I really only wanted to 0x00
0x00. Next attempt was to encode an empty string to see whether there is
a common prefix, but an empty string is encoded as 0 bytes, no BOM. :-)

So finally I compared "X" to "X\0" and stripped what seems to be
"X". I'm pretty sure this breaks for certain encodings, but not worse
than it has worked before.

And then I sprinkled some memoization on top.

TBM I could also live with reverting the whole commit, saying "don't use
multi-byte encodings in tars" and be done. The original test we had
worked by accident, if the old test had used UTF16-LE instead and a 49
character file name it would have failed as we'd try to decode a byte
array with an odd number of bytes.

Finding A solution was a nice puzzle, but that doesn't mean we have to
use it.

Stefan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@ant.apache.org
For additional commands, e-mail: dev-h...@ant.apache.org

Reply via email to