On 19 March 2013 18:27, Simone Tripodi <simonetrip...@apache.org> wrote:
> Hi there,
>
>>> is there anything you want to discuss before I go ahead with a new RC?
>>> I am performing some ITs in non-public projects and I planned to cut a
>>> new RC in 1-2 days...
>>
>> I think there are still some issues in Base64Decoder (this time not
>> ones I accidentally introduced - sorry about that!).
>>
>
> don't worry, it can happen - unit tests saved us :)
>
>>
>> For example, it does not seem to handle whitespace properly -
>> certainly not in the last 4 bytes.
>> Nor does it detect invalid characters in input - AFAICT these will be
>> treated as binary (because the decoding table is initialised to 0).
>>
>
> I am honestly not worried about white spaces, since those decoders are
> not part of APIs and are used by the MimeUtility which tokenizes
> strings by whitespaces
>
>> With the new  version of the code, skipped characters can cause array
>> bounds errors after the main loop, because finish is calculated
>> without taking skipped characters into account, thus the loop can
>> consume additional bytes. In the original version, this would not
>> cause array bounds errors, but the same byte could be used more than
>> once.
>>
>
> that is worrying
>
>> I think it really needs to be rewritten, which I am happy to do.
>> I intend to fetch the bytes, skipping as necessary otherwise storing
>> them in an array. When there are 3 bytes available, output them and
>> resume.
>> At end of input, output any remaining bytes.
>>
>> It remains to be decided how to handle '=' before the end of the byte
>> data - does that signal end of input, or is it an error if '=' appears
>> other than at the end?
>> Indeed, should it handle whitespace at all, and if so are there
>> restrictions where that may appear?
>>
>> QPD has problems too: it assumes that characters following '=' are
>> hex, but does not validate this.
>> It also fails to handle lowercase hex, which is not very tolerant.
>> Also the exception messages aren't specific enough - cannot
>> distinguish truncated encoded sequence from invalid CRLF pair.
>> Again, I can take this on.
>>
>
> I won't appose to a rewrite, but at that point, since both decoders
> are buggy: what about adding a dependency to commons-coded? It already
> handles both Base64 and QuotedPrintable codecs.
> We could shading the interested classes, if we want to avoid less
> dependencies as possible, but a dependency to another commons
> component sounds fair to me.

Well, codec is about 260 kb, whereas fileupload is 70 kb.

If fileupload is ever used on a device with limited memory
(smartphone) that would be a huge disadvantage, unless the application
happened to use codec already.

>> Given that MimeUtil seems to strip spaces on input, do the decoders
>> need to handle embedded spaces?
>
> not in my opinion, as I wrote before.

OK, removing that support would fix the end of buffer bugs and
simplify the code.

> Thanks and best,
> -Simo
>
> http://people.apache.org/~simonetripodi/
> http://simonetripodi.livejournal.com/
> http://twitter.com/simonetripodi
> http://www.99soft.org/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to