Hi, Since I have no control on the email clients sending the mails, kindly suggests suitable measures that I can take up on my end to mitigate the problem of character corruption.
I think modifying the charset during email body decoding will work for such cases, can somebody post relevant api hooks of mime4j that I can use for the idea that I have put forward (is it feasible too?) ? Thanks Ashish -----Original Message----- From: Tze-Kei Lee [mailto:[email protected]] Sent: Monday, February 13, 2012 5:45 PM To: [email protected] Subject: Re: Character corruption with Traditional chinese Hi, It looks like the email client composed the email made mistake when pick charset. GB 2312 contains only Simplified Chinese while CP 932 or GB 18030 is extended to include Traditional Chinese (and Japanese, Korean), and the first sentence in the email is using the extended code points. Best Regards Tze-Kei On Mon, Feb 13, 2012 at 7:32 PM, Sharma, Ashish <[email protected]> wrote: > Hi, > > I use mime4j 0.7.2 for email parsing. > > I am getting problem of character set corruption for Traditional Chinese > characters. > > Sample email that is creating problems is at: > > http://pastebin.com/Q38VXsLb > > Here I noticed that when the email is parsed with default charset encoding > (charset encoding that was recived from email server) of : > > charset="gb2312" > > I get the character set corruption, while if I manually change this charset > encoding in the email stream to : > > charset="gb18030" > > and then parse it via mime4j, there is no character corruption. > > Can somebody please explain why I am getting this behavior? > > Moreover is there a way in mime4j where I can substitute character sets for > the above kind of specific cases? > > Thanks > Ashish > > >
