[
https://issues.apache.org/jira/browse/CONNECTORS-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059970#comment-16059970
]
Steph van Schalkwyk commented on CONNECTORS-1433:
-------------------------------------------------
In this scenario below, what would the equivalent of the "my_data" field name
be in the ES connector?
PUT my_index/my_type/my_id?pipeline=attachment
{
"*+my_data+*":
"e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my_index/my_type/my_id
{
"found": true,
"_index": "my_index",
"_type": "my_type",
"_id": "my_id",
"_version": 1,
"_source": {
"*+my_data+*":
"e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
"attachment": {
"content_type": "application/rtf",
"language": "ro",
"content": "Lorem ipsum dolor sit amet",
"content_length": 28
}
}
}
When it PUTs the document, what is the field name? I have this from
ElasticSearchIndex.java (line 202):
// Since ES 1.0
pw.print(" \"_content\" : \"");
Base64 base64 = new Base64();
base64.encodeStream(inputStream, pw);
pw.print("\"}");
so I assumed it was the _content field, but that doesn't work in the pipeline.
I'll investigate further.
> Add CLI options to pipeline modules, e.g. allow Tika to export TEXT, not
> BASE64
> -------------------------------------------------------------------------------
>
> Key: CONNECTORS-1433
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1433
> Project: ManifoldCF
> Issue Type: Wish
> Components: Tika extractor
> Reporter: Steph van Schalkwyk
> Assignee: Karl Wright
> Attachments: CONNECTORS-1433.patch
>
>
> Would love to have Tika spout TEXT, not BASE64.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)