[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content

2017-04-15 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970156#comment-15970156
 ] 

Karl Wright commented on CONNECTORS-1410:
-

[~kamaci] Please go ahead and commit to trunk.


> Binary Attachment Data as Plain Text at Email Content
> -
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Email connector
>Affects Versions: ManifoldCF 2.6
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, 
> CONNECTORS-1410.patch, CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content

2017-04-15 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970153#comment-15970153
 ] 

Furkan KAMACI commented on CONNECTORS-1410:
---

[~kwri...@metacarta.com] Opps, nice catch! I let it to be searchable but 
accidentally removed  from BASIC_SEARCHABLE_ATTRIBUTES.

> Binary Attachment Data as Plain Text at Email Content
> -
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Email connector
>Affects Versions: ManifoldCF 2.6
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, 
> CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content

2017-04-15 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970147#comment-15970147
 ] 

Karl Wright commented on CONNECTORS-1410:
-

[~kamaci] The patch looks good except I think you could still allow the body to 
be searchable and that would be fine.  Once that is restored, please go ahead 
and commit to trunk.  I will pull up to the release branch.


> Binary Attachment Data as Plain Text at Email Content
> -
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Email connector
>Affects Versions: ManifoldCF 2.6
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, 
> CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content

2017-04-15 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970143#comment-15970143
 ] 

Furkan KAMACI commented on CONNECTORS-1410:
---

[~kwri...@metacarta.com], you are right. Could you check my latest patch? I've 
tested it and works fine.

> Binary Attachment Data as Plain Text at Email Content
> -
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Email connector
>Affects Versions: ManifoldCF 2.6
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch, 
> CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content

2017-04-15 Thread Karl Wright (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970111#comment-15970111
 ] 

Karl Wright commented on CONNECTORS-1410:
-

[~kamaci] I think having the BODY present twice in the indexed document is 
confusing and unnecessary.  Since we've already broken backwards compatibility 
for this connector, if we're going to index the body as the main document 
content, I think we might as well remove the body from all consideration for 
the metadata.  What do you think?


> Binary Attachment Data as Plain Text at Email Content
> -
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Email connector
>Affects Versions: ManifoldCF 2.6
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content

2017-04-15 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970107#comment-15970107
 ] 

Furkan KAMACI commented on CONNECTORS-1410:
---

[~kwri...@metacarta.com] Do you want me remove body from included metadata?

> Binary Attachment Data as Plain Text at Email Content
> -
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Email connector
>Affects Versions: ManifoldCF 2.6
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content

2017-04-15 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970063#comment-15970063
 ] 

Furkan KAMACI commented on CONNECTORS-1410:
---

[~kwri...@metacarta.com] No, I claim that we already get Body via:

{code:java}
...
mbp.getContent().toString()
...
{code}

So, I've just set it as content too. You can check my updated patch. 

> Binary Attachment Data as Plain Text at Email Content
> -
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Email connector
>Affects Versions: ManifoldCF 2.6
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch, CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CONNECTORS-1410) Binary Attachment Data as Plain Text at Email Content

2017-04-15 Thread Furkan KAMACI (JIRA)

[ 
https://issues.apache.org/jira/browse/CONNECTORS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969959#comment-15969959
 ] 

Furkan KAMACI commented on CONNECTORS-1410:
---

[~kwri...@metacarta.com] What do you think about this fix?

> Binary Attachment Data as Plain Text at Email Content
> -
>
> Key: CONNECTORS-1410
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1410
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Email connector
>Affects Versions: ManifoldCF 2.6
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: ManifoldCF 2.8
>
> Attachments: CONNECTORS-1410.patch
>
>
> Previously, we were indexing e-mails and its attachments together. We changed 
> this logic with CONNECTORS-1375 as indexing e-mail and its attachments 
> separately.
> However, there is a problem. Content fields of emails which has attachment(s) 
> includes both body and attachments's binary content as plain text.
> As we index attachments separately, we can just index body as content instead 
> of appending email body and all attachments' binary data as plain text.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)