Re: RFR 8025003: Base64 should be less strict with padding

Xueming Shen Tue, 12 Nov 2013 21:21:40 -0800

On 11/12/13 8:21 PM, Bill Shannon wrote:

Xueming Shen wrote on 11/12/2013 04:25 PM:

On 11/12/2013 03:32 PM, Bill Shannon wrote:

This still seems like an inconsistent, and inconvenient, approach to me.


You've decided that some encoding errors (i.e., missing pad characters)
can be ignored.  You're willing to assume that the missing characters aren't
missing data but just missing padding.  But if you find a padding character
where you don't expect it you won't assume that the missing data is zero.

"missing pad characters" in theory is not an encoding errors. As the RFC
suggested, the
use of padding in base64 data is not required or used. They mainly serve the
purpose of
providing the indication of "end of the data". This is why the padding
character(s) is not
required (optional) by our decoder at first place. However, if the padding
character(s) is
present, they need to be correctly encoded, otherwise, it's a malformed base64
stream.

I think we're interpreting the spec differently.

I meant to say "The RFC says the use of padding in base64 data is notrequired nor used, in some circumstances".

I interpret it as the padding is optional in some circumstances.

-Sherman


If the padding characters are not needed, why define them at all?
What advantage would there be in defining characters that convey no
information?  Why not let the data just end wherever it ends, throwing
away unused bits?

The padding characters are required.  If they're missing, you have no
idea if the encoder just left them out, or if the data was truncated
or corrupted.

I understand the desire to check that the data is encoded exactly the
way the spec says it should be encoded, and to consider it an error
otherwise.  This is the "strict" approach.  But that's not what you're
doing.  You're deciding that you care about some kinds of errors but
not all kinds of errors.  That's a judgment call that, as far as I can
tell, is not based on real experience with encoded data.

To address your strong request fore more "lenient" MIME decoder, we have updated
the
spec and implementation to be a reasonable liberal for the incorrect padding at
the end
of the mime base64 data as showed below

      xxxx =       unnecessary padding character at the end of encoded stream
      xxxx xx=     missing the last padding character
      xxxx xx=y    missing the last padding character, instead having a
non-padding char

With the assumption that it still follows the "spirit" of the purpose of padding
character (as suggested by the RFC), to indicate the end of the data stream, no
more
decoding is needed beyond the padding character. Yes, it makes the MIME decoder
somewhat
"inconsistent" with our original design and the rest of other type of decoders,
but we
thought it might provide the "convenience" requested.

But a single tangling byte at the end of the encoded data stream is obvious an
encoding
error or transportation error. As I said, I don't think the decoder should try
to rescue with
guess. The proposed change is to try to provide a simple mechanism that the
application
can do some lifecircle/error management to recovery from the malformed data
stream, if
desired. This is actually is NOT what j.u.Base64 is desired for. The primary
goal is to provide
a set of easy/simple utility methods for base64 encoding/decoding, not such
complicated
error recovery management, as the java.nio.charset.De/Encoder provides.

There's really no error recovery possible, and certainly no program is
going to attempt error recovery.  As I said, there's only two reasonable
things to do:  1) throw up your hands, claim the data is corrupt, and tell
the user there's nothing you can do, or 2) do your best job to give the user
as much data as possible, and let the user decide if the data is in fact 
corrupt.
I'd be happy for you to provide options to do both.  Doing something that's
half way between the two just isn't useful.

The JavaDoc definitely can be improved to provide a detailed use case, sample,
if it
helps. But if it's definitely a no-go, maybe we can leave this for jdk9 for
bigger surgery.

Without support for error-free decoding, there's little motivation for me
to ever convert JavaMail to use this new capability.

Re: RFR 8025003: Base64 should be less strict with padding

Reply via email to