Ruiqi Dong created CODEC-344:
--------------------------------

             Summary: Base64.Builder#setEncodeTable(...) accepts invalid custom 
alphabets
                 Key: CODEC-344
                 URL: https://issues.apache.org/jira/browse/CODEC-344
             Project: Commons Codec
          Issue Type: Bug
            Reporter: Ruiqi Dong


*Summary*

`Base64.Builder#setEncodeTable(...)` accepts invalid custom 64-byte encode 
tables without complete validation. This is not the same issue as the Base58 
custom alphabet bug: Base64 does use its configured encode/decode tables for 
instance operations.

However, invalid Base64 alphabets can still create broken codecs or throw the 
wrong exception type:

1. Duplicate alphabet bytes are accepted silently, and the codec can no longer 
decode its own output correctly.
2. A non-ASCII byte (`>= 0x80`) throws `ArrayIndexOutOfBoundsException` while 
building the decode table.
3. An alphabet containing the configured padding byte is accepted silently, and 
encoded data can be interpreted as padding during decode.
 
*Affected code*

`org.apache.commons.codec.binary.Base64`

`calculateDecodeTable(...)` builds the custom decode table without duplicate 
checking or unsigned byte indexing:
{code:java}
private static byte[] calculateDecodeTable(final byte[] encodeTable) {
    final byte[] decodeTable = new byte[DECODING_TABLE_LENGTH];
    Arrays.fill(decodeTable, (byte) -1);
    for (int i = 0; i < encodeTable.length; i++) {
        decodeTable[encodeTable[i]] = (byte) i;
    }
    return decodeTable;
} {code}
`setEncodeTable(...)` derives a decode table, but it does not reject duplicate 
entries or entries that conflict with the configured padding byte:
{code:java}
@Override
public Builder setEncodeTable(final byte... encodeTable) {
    final boolean isStandardEncodeTable = Arrays.equals(encodeTable, 
STANDARD_ENCODE_TABLE);
    final boolean isUrlSafe = Arrays.equals(encodeTable, URL_SAFE_ENCODE_TABLE);
    setDecodeTableRaw(isStandardEncodeTable || isUrlSafe ? DECODE_TABLE : 
calculateDecodeTable(encodeTable));
    return super.setEncodeTable(encodeTable);
} {code}
The padding-byte check needs to happen when the final `Base64` instance is 
built, because padding is configurable and the builder setter order can vary.
 
*Reproducer*
The following regression tests describe the expected behavior:
{code:java}
@Test
void testBuilderCustomEncodeTableRejectsDuplicateEntries() {
    final byte[] encodeTable =
            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
                    .getBytes(StandardCharsets.US_ASCII);
    encodeTable[1] = encodeTable[0];

    assertThrows(IllegalArgumentException.class, () -> 
Base64.builder().setEncodeTable(encodeTable));
}

@Test
void testBuilderCustomEncodeTableRejectsNonAsciiEntries() {
    final byte[] encodeTable =
            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
                    .getBytes(StandardCharsets.US_ASCII);
    encodeTable[0] = (byte) 0x80;

    // Assumes the "reject non-ASCII" direction. If non-ASCII support is chosen 
instead,
    // assert a successful round-trip rather than IllegalArgumentException.
    assertThrows(IllegalArgumentException.class, () -> 
Base64.builder().setEncodeTable(encodeTable));
}

@Test
void testBuilderCustomEncodeTableRejectsPaddingByte() {
    final byte[] encodeTable =
            "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
                    .getBytes(StandardCharsets.US_ASCII);
    encodeTable[0] = '=';

    assertThrows(IllegalArgumentException.class, () -> 
Base64.builder().setEncodeTable(encodeTable).get());
} {code}
*Observed behavior*
Duplicate alphabet byte:
{code:java}
duplicate[1] = duplicate[0]
encode({0}) -> "AA"
decode("AA") -> [4]
round-trip -> false {code}
Non-ASCII alphabet byte:
{code:java}
nonAscii[0] = (byte) 0x80
Base64.builder().setEncodeTable(nonAscii).get()
-> java.lang.ArrayIndexOutOfBoundsException: Index -128 out of bounds for 
length 256 {code}
Padding byte in alphabet:
{code:java}
pad[0] = '='
encode({0}) -> "=="
decode("==") -> []
round-trip -> false {code}
*Expected behavior*

`Base64.Builder#setEncodeTable(...)` should reject invalid custom alphabets 
with a clear `IllegalArgumentException`.

At minimum, the encode table should:

1. Have exactly 64 entries.
2. Contain no duplicate byte values.
3. Not contain the final configured padding byte.
4. Either reject non-ASCII byte values clearly, or fully support them using 
unsigned lookup consistently in decode-table construction, decode, and 
`isInAlphabet`.

For consistency with the sibling Base classes, rejecting invalid alphabets 
during builder configuration or final instance construction is preferable to 
silently building a codec that corrupts round-trips.
Base64 exposes a public builder API for custom alphabets. The resulting codec 
should either be valid and self-consistent, or reject invalid configuration. It 
should not silently produce a codec that cannot decode its own output, and it 
should not fail with an internal `ArrayIndexOutOfBoundsException` for invalid 
alphabet input.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to