Peter Maydell <peter.mayd...@linaro.org> writes: > On 6 February 2013 09:06, Markus Armbruster <arm...@redhat.com> wrote: >> As far as I can tell, it never fails, but silently ignores characters >> outside the alphabet [A-Za-z0-9+/] > > This bit at least is required behaviour: see RFC2045 section 6.8: > > Any characters outside of the base64 alphabet are to be ignored in > base64-encoded data. > > (thanks to Tony Finch for pointing that one out to me.)
RFC 2045 is "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies". As such, it is about a *transfer encoding* of Base64. RFC 3548 "The Base16, Base32, and Base64 Data Encodings": 2.3. Interpretation of non-alphabet characters in encoded data Base encodings use a specific, reduced, alphabet to encode binary data. Non alphabet characters could exist within base encoded data, caused by data corruption or by design. Non alphabet characters may be exploited as a "covert channel", where non-protocol data can be sent for nefarious purposes. Non alphabet characters might also be sent in order to exploit implementation errors leading to, e.g., buffer overflow attacks. Implementations MUST reject the encoding if it contains characters outside the base alphabet when interpreting base encoded data, unless the specification referring to this document explicitly states otherwise. Such specifications may, as MIME does, instead state that characters outside the base encoding alphabet should simply be ignored when interpreting data ("be liberal in what you accept"). Note that this means that any CRLF constitute "non alphabet characters" and are ignored. Furthermore, such specifications may consider the pad character, "=", as not part of the base alphabet until the end of the string. If more than the allowed number of pad characters are found at the end of the string, e.g., a base 64 string terminated with "===", the excess pad characters could be ignored. 8. Security Considerations [...] If non-alphabet characters are ignored, instead of causing rejection of the entire encoding (as recommended), a covert channel that can be used to "leak" information is made possible. The implications of this should be understood in applications that do not follow the recommended practice. [...]