Ruiqi Dong created CODEC-343:
--------------------------------

             Summary: Base32.Builder#setHexDecodeTable(boolean) sets the encode 
table to the decode table, corrupting encoding
                 Key: CODEC-343
                 URL: https://issues.apache.org/jira/browse/CODEC-343
             Project: Commons Codec
          Issue Type: Bug
            Reporter: Ruiqi Dong


*Summary*
`Base32.Builder#setHexDecodeTable(boolean)` is implemented as 
`setEncodeTable(decodeTable(useHex))` — it passes the **decode** lookup table 
to `setEncodeTable(...)`. Used on its own, the resulting `Base32` therefore 
encodes with the decode lookup array (whose low entries are the `-1` sentinel) 
instead of the Base32-Hex alphabet, so it emits bytes outside the alphabet and 
cannot decode its own output.
 
*Affected code*File: `src/main/java/org/apache/commons/codec/binary/Base32.java`
{code:java}
public Builder setHexDecodeTable(final boolean useHex) {
    return setEncodeTable(decodeTable(useHex)); // passes the DECODE table to 
setEncodeTable
} {code}
`decodeTable(useHex)` returns `HEX_DECODE_TABLE` / `DECODE_TABLE` (the lookup 
arrays used for decoding). Passing one of those to `setEncodeTable(...)` makes 
it the encode table, so encoding reads `-1` sentinels and emits invalid bytes.
 
The only test that touches it chains another setter right after:
{code:java}
Base32.builder()
      .setHexDecodeTable(false)
      .setHexDecodeTable(true)
      .setHexEncodeTable(false)
      .setHexEncodeTable(true)   // "last set wins" overwrites the broken 
encode table
      ... {code}
The trailing `setHexEncodeTable(true)` restores a correct encode table, masking 
the defect, so the bug never surfaces when `setHexDecodeTable` is used in 
isolation.
 
*Reproducer*
Add the following test to 
`src/test/java/org/apache/commons/codec/binary/Base32Test.java`:
{code:java}
@Test
void testBuilderSetHexDecodeTableEncodesWithHexAlphabet() {
    final Base32 base32 = 
Base32.builder().setHexDecodeTable(true).setLineLength(0).get();
    final byte[] data = { 0 };
    final byte[] encoded = base32.encode(data);
    assertEquals("00======", new String(encoded, StandardCharsets.US_ASCII),
            "setHexDecodeTable(true) should encode with the Base32-Hex 
alphabet");
    assertArrayEquals(data, base32.decode(encoded),
            "the instance should decode its own output");
} {code}
Run:
{code:java}
mvn -q 
-Dtest=org.apache.commons.codec.binary.Base32Test#testBuilderSetHexDecodeTableEncodesWithHexAlphabet
 test {code}
*Observed behavior*
Encoding `\{ 0 }` does not produce the Base32-Hex form `"00======"`. The 
encoder emits the `-1` sentinel from the decode table as `0xFF`:
{code:java}
encode({0}) -> bytes [-1, -1, 61, 61, 61, 61, 61, 61]   // 0xFF 0xFF ======
decode(...) -> []                                        // round-trip lost 
{code}
So the encoding assertion fails and the instance cannot decode its own output.
 
*Expected behavior*
`setHexDecodeTable(true)` should configure a `Base32` that encodes with the 
Base32-Hex alphabet and decodes its own output. It must set the encode table:
{code:java}
public Builder setHexDecodeTable(final boolean useHex) {
    return setEncodeTable(encodeTable(useHex));
} {code}
 
`setHexDecodeTable(...)` is a public builder API (`@since 1.18.0`). When used 
on its own — the natural way to select the Base32-Hex variant — it produces an 
instance that emits non-alphabet bytes and corrupts data, because the encode 
and decode tables are crossed.
 
 
Same family as the custom-alphabet decode mismatch in `Base16.Builder` 
(CODEC-341 [https://issues.apache.org/jira/browse/CODEC-341]) and 
`Base32.Builder` (CODEC-342 [https://issues.apache.org/jira/browse/CODEC-342]). 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to