On 19 March 2013 18:32, Gary Gregory <garydgreg...@gmail.com> wrote: > Yes, why not depend on commons-codec?
Codec is nearly 4 times the size of file upload. Not an ideal partner. > Gary > > On Mar 19, 2013, at 14:28, Simone Tripodi <simonetrip...@apache.org> wrote: > >> Hi there, >> >>>> is there anything you want to discuss before I go ahead with a new RC? >>>> I am performing some ITs in non-public projects and I planned to cut a >>>> new RC in 1-2 days... >>> >>> I think there are still some issues in Base64Decoder (this time not >>> ones I accidentally introduced - sorry about that!). >> >> don't worry, it can happen - unit tests saved us :) >> >>> >>> For example, it does not seem to handle whitespace properly - >>> certainly not in the last 4 bytes. >>> Nor does it detect invalid characters in input - AFAICT these will be >>> treated as binary (because the decoding table is initialised to 0). >> >> I am honestly not worried about white spaces, since those decoders are >> not part of APIs and are used by the MimeUtility which tokenizes >> strings by whitespaces >> >>> With the new version of the code, skipped characters can cause array >>> bounds errors after the main loop, because finish is calculated >>> without taking skipped characters into account, thus the loop can >>> consume additional bytes. In the original version, this would not >>> cause array bounds errors, but the same byte could be used more than >>> once. >> >> that is worrying >> >>> I think it really needs to be rewritten, which I am happy to do. >>> I intend to fetch the bytes, skipping as necessary otherwise storing >>> them in an array. When there are 3 bytes available, output them and >>> resume. >>> At end of input, output any remaining bytes. >>> >>> It remains to be decided how to handle '=' before the end of the byte >>> data - does that signal end of input, or is it an error if '=' appears >>> other than at the end? >>> Indeed, should it handle whitespace at all, and if so are there >>> restrictions where that may appear? >>> >>> QPD has problems too: it assumes that characters following '=' are >>> hex, but does not validate this. >>> It also fails to handle lowercase hex, which is not very tolerant. >>> Also the exception messages aren't specific enough - cannot >>> distinguish truncated encoded sequence from invalid CRLF pair. >>> Again, I can take this on. >> >> I won't appose to a rewrite, but at that point, since both decoders >> are buggy: what about adding a dependency to commons-coded? It already >> handles both Base64 and QuotedPrintable codecs. >> We could shading the interested classes, if we want to avoid less >> dependencies as possible, but a dependency to another commons >> component sounds fair to me. >> >>> Given that MimeUtil seems to strip spaces on input, do the decoders >>> need to handle embedded spaces? >> >> not in my opinion, as I wrote before. >> >> Thanks and best, >> -Simo >> >> http://people.apache.org/~simonetripodi/ >> http://simonetripodi.livejournal.com/ >> http://twitter.com/simonetripodi >> http://www.99soft.org/ >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org