2011/4/20 Adam Warski <[email protected]>: > Hello, > > I am trying to decode a filename of an attachment sent with Mail.app. > Originally, the file is named "Żółw.rtf" (polish for Turtle.rtf). > The headers are: > > --Apple-Mail-19-721116558 > Content-Disposition: attachment; > filename*=utf-8''Z%CC%87o%CC%81%C5%82w.rtf > Content-Type: text/rtf; > x-unix-mode=0644; > name="=?utf-8?Q?Z=CC=87o=CC=81=C5=82w=2Ertf?=" > Content-Transfer-Encoding: 7bit > > So the corresponding javax.mail.Part.getFileName() returns > "=?utf-8?Q?Z=CC=87o=CC=81=C5=82w=2Ertf?=". > > I tried decoding both with mime4j's DecoderUtil.decodeEncodedWords and > JavaMail's MimeUtility.decodeText but the result is: "ZÃáoÃÅ≈Çw.rtf". Clearly > not the original :). > > For comparison, MimeUtility.encodeText returns: > =?UTF-8?Q?=C5=BB=C3=B3=C5=82w.rtf?= > in contrast to: > =?utf-8?Q?Z=CC=87o=CC=81=C5=82w=2Ertf?= > coming from the e-mail. > > According to my research, the letter "Ż" can be encoded in two ways: either > as a single letter or as "Z" + above-dot. MimeUtility.encodeText uses the > former, Mail.app the latter.
Does this means also that when Mail.app receives a mime with the name =?utf-8?Q?Z=CC=87o=CC=81=C5=82w=2Ertf?= it correctly saves the attchment as "Żółw.rtf" ?? It would be weird as I'm not aware (but I'm not an unicode guru, and that's why I ask you for this test) of utf-8 decoding where an UTF8 sequence alter the previous Ascii character and the Z is wrote in clear ascii in that encoded string. > Is there some way to properly decode the filename? You can try using java.text.Normalizer. Let us now if this works. Stefano > Thanks! > > -- > Adam Warski > http://www.warski.org > http://www.softwaremill.eu
