On Mon, Aug 17, 2009 at 3:22 PM, Aron Wieck<[email protected]> wrote: >> > assertEquals("Test ü and more", DecoderUtil.decodeEncodedWords("Test >> > =?ISO-8859-1?Q?=FC_?= =?ISO-8859-1?Q?and_more?=")); >> >> Coincidentally the same problem has been reported yesterday by Wim >> Jongman. Funny how bugs like this can somehow remain undetected for >> years and then show up all of a sudden.. >> > This then qualifies as a Schroedinbug: > http://catb.org/~esr/jargon/html/S/schroedinbug.html
:-) >> > After this fix there is only one space between "ü" and "and", which I >> > think >> > is not correct (but I'm not sure). >> >> No I think one space would be correct, see MIME4J-104. >> > My bad! Sorry. > >> > Proposed Solution: >> > >> > Replace "indexOf" by Regex matching, like so: >> > [...] >> >> I'm afraid that would reintroduce MIME4J-104.. >> > > If you are interested I could write a regex based version which will not > reintroduce the double space bug. > I'ld use the regex to extract charset, encoding and encoded string in one > go. I think it will be at least as fast as the current method. > However, java.util.regex requires Java 1.4, if that's a no-go I won't > bother. Regex wouldn't be a problem since Mime4j already depends on Java 5. I'm not sure how a regex solution could compete with a few indexOf and substring calls in terms of speed though. I mean Pattern.compile() alone has to build a DFA from the input string. I'd like to give it a try by refactoring and fixing the existing code. Markus > Thanks for your quick response.
