[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970039#comment-15970039
 ] 

Furkan KAMACI commented on CONNECTORS-1410:
-------------------------------------------

[~kwri...@metacarta.com] This is how _body_ is already set at ManifoldCF:

{code:java}
Object o = msg.getContent();
if (o instanceof Multipart) {
  Multipart mp = (Multipart) msg.getContent();
  for (int k = 0, n = mp.getCount(); k < n; k++) {
    Part part = mp.getBodyPart(k);
    String disposition = part.getDisposition();
    if ((disposition == null)) {
      MimeBodyPart mbp = (MimeBodyPart) part;
      if (mbp.isMimeType(EmailConfig.MIMETYPE_TEXT_PLAIN)) {
        rd.addField(EmailConfig.EMAIL_BODY, mbp.getContent().toString());
      } else if (mbp.isMimeType(EmailConfig.MIMETYPE_HTML)) {
        rd.addField(EmailConfig.EMAIL_BODY, mbp.getContent().toString()); 
//handle html accordingly. Returns content with html tags
      }
    }
  }
} else if (o instanceof String) {
  rd.addField(EmailConfig.EMAIL_BODY, (String)o);
}
{code}

Entire body is already read and this problem is still valid even without this 
improvement. On the other hand, we just retrieve body. Previously we were 
streaming both body and attachments of e-mail. So, it may be the reason why 
current code does not consider it as problem. 

My patch is like that as pseudo code:

{code:java}
rd.setContent(rd.getBody())
{code}

> Binary Attachment Data as Plain Text at Email Content
> -----------------------------------------------------
>
>                 Key: CONNECTORS-1410
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Email connector
>    Affects Versions: ManifoldCF 2.6
>            Reporter: Furkan KAMACI
>            Assignee: Furkan KAMACI
>             Fix For: ManifoldCF 2.8
>
>         Attachments: CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to