[ 
https://issues.apache.org/jira/browse/CONNECTORS-954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036739#comment-14036739
 ] 

Karl Wright commented on CONNECTORS-954:
----------------------------------------

Added the field mapping tab: r1603687

Tomorrow will revamp the amazon connector to remove the tika transformer within.
Still unanswered: (a) whether there's a good way to stream the extracted 
content to Amazon, and (b) how to remove newline characters, as is currently 
done.  Ideally, we'd construct the JSON on the fly, but I don't know how 
realistic that would be.  Also, quoting may need to be addressed.


> Amazon Cloud Search connector's use of Tika should be revisited after 
> pipelines are added
> -----------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-954
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-954
>             Project: ManifoldCF
>          Issue Type: Task
>          Components: Amazon CloudSearch output connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>
> Amazon Cloud Search connector uses Tika to extract content from binaries.
> When the pipeline support in CONNECTORS-946 is committed to trunk, we should 
> do two things:
> (a) Create a Transformation Connection that extracts binary data into 
> metadata, and
> (b) Remove the Tika dependency from the Amazon connector



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to