Re: Reading iso-8859-1 TextPart content

Oleg Kalnichevski Fri, 15 May 2009 08:38:00 -0700

On Fri, May 15, 2009 at 05:35:25PM +0200, Markus Wiederkehr wrote:
> On Fri, May 15, 2009 at 12:02 AM, Alejandro Valdez
> <[email protected]> wrote:
> > Hi list, I'm using mime4j to extract the text content from the
> > e-mail's text/html parts, I
> > found that sometimes there are non-standard MIME parts that use
> > iso-8859-1 characters (i.e.
> > accented vowels) but don't declare any charset in the part's MIME header.
> >
> > In that cases I found that mime4j creates a Reader that uses us-ascii
> > as the charset (that is what
> > should be done when there is no charset declaration in the header).
> > Reading the content from that
> > Reader produces char[] with the unicode FFFD symbol in replacement of
> > the non us-ascii characters.
> >
> > Do anyone know some way to use the mime4j API to return a Reader with
> > iso-8859-1 charset set,
> > or some other solution to this (maybe common) problem?
> 
> I looks indeed like this is not possible.
> 
> For Mime4j 0.7 I would propose that we pull up getInputStream() from
> BinaryBody to SingleBody so that TextBody gets this method too.
> 
> If that's okay I can open a JIRA and fix the issue.
>


+1

Oleg


> > This is the way I'm reading a TextPart content:
> >
> > TextBody textBody = (TextBody) part.getBody();
> > Reader reader = textBody.getReader();
> > char[] buffer = new char[16000];
> > StringBuilder sb = new StringBuilder();
> >
> > int bytesReaded = 1;
> > while (bytesReaded != -1) {
> > ??bytesReaded = reader.read(buffer,0,buffer.length);
> > ??if(bytesReaded != -1) {
> > ?? ??sb.append(buffer,0,bytesReaded);
> > ?? ?? ?? ??}
> > }
> > return sb.toString();
> 
> Looks like you want to convert the TextBody to a String.. How about this:
> 
>         TextBody textBody = (TextBody) part.getBody();
>         ByteArrayOutputStream baos = new ByteArrayOutputStream();
>         textBody.writeTo(baos);
>         return new String(baos.toByteArray(), "iso-8859-1");
> 
> hth
> Markus

Re: Reading iso-8859-1 TextPart content

Reply via email to