[ 
https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057167#comment-16057167
 ] 

Karl Wright commented on CONNECTORS-1433:
-----------------------------------------

We've seriously not had any issue with Tika output format so far, and Tika has 
been integrated for several years.  Tests on documents such as excel 
spreadsheets, PDFs, text files, and word documents heretofore have yielded text 
output, not base64.  I don't know what has changed, if anything, but if these 
formats now generate base64 it sounds like a contract change of some kind that 
we missed?  


> Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not 
> BASE64
> -------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1433
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1433
>             Project: ManifoldCF
>          Issue Type: Wish
>          Components: Tika extractor
>            Reporter: Steph van Schalkwyk
>
> Would love to have Tika spout TEXT, not BASE64.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to