[ 
https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057881#comment-16057881
 ] 

Karl Wright commented on CONNECTORS-1433:
-----------------------------------------

I've never been clear on whether the ES connector is using the mapper 
attachment correctly or not.  The content is binary (not text) and ES doesn't 
do its own Tika extraction of the binary, so I can see why this might be 
difficult.  But an assumed ability to convert directly to text isn't going to 
work either because we do primarily output binary content.

The big question is what it a better way to view this problem?

(1) If ES can only accept *text* output, then we should reject all content that 
isn't text, and we should *not* convert to base64.  That would force people 
generally to use the Tika transformer with the ES output connector.
(2) If the mapper attachment can do some kinds of conversions, and it can 
convert base64 back to characters, then we can leave things as they are.


Please advise.






> Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not 
> BASE64
> -------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1433
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1433
>             Project: ManifoldCF
>          Issue Type: Wish
>          Components: Tika extractor
>            Reporter: Steph van Schalkwyk
>            Assignee: Karl Wright
>
> Would love to have Tika spout TEXT, not BASE64.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to