Binh, Thanks, With your help I think I am closer to the answer. Wih the 
sample mapping you provided, I should be able to provide the base 64 
contents of the image file as the "contents" field, and the ocrtext as 
"text field. So, when the ocr text is searched, i can return the "content" 
which is the image. With the above mapping I believe the image is saved in 
the _source as well as the field for "highlighting " purposes, Can I 
prevent it from being stored in _source by something like this?

startObject("_source").field("enabled","no").endObject()

On Thursday, February 27, 2014 8:29:25 AM UTC-5, Binh Ly wrote:
>
> You certainly can add a new field, and then just put the OCR text into 
> that new field. So for example:
>
> Mapping:
>
>         PutMappingResponse putMappingResponse = new 
> PutMappingRequestBuilder(
>             
> client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
>                 XContentFactory.jsonBuilder().startObject()
>                     .field(DOCUMENT_TYPE).startObject()
>                         .field("properties").startObject()
>                             .field("text").startObject()
>                                 .field("type", "string")
>                             .endObject()
>                             .field("file").startObject()
>                                 .field("store", "yes")
>                                 .field("type", "attachment")
>                                 .field("fields").startObject()
>                                     .field("file").startObject()
>                                         .field("store", "yes")
>                                     .endObject()
>                                 .endObject()
>                             .endObject()
>                         .endObject()
>                     .endObject()
>                 .endObject()
>         ).execute().actionGet();
>
> Then put the OCR text into the "text" field:
>
>         IndexResponse indexResponse = client.prepareIndex(INDEX_NAME, 
> DOCUMENT_TYPE, "1")
>             .setSource(XContentFactory.jsonBuilder().startObject()
>                 .field("text", ocrText)
>                 .field("file").startObject()
>                     .field("content", fileContents)
>                     .field("_indexed_chars", -1)
>                 .endObject()
>             .endObject()
>         ).execute().actionGet();
>
> You probably don't need to index the image binary information - not sure 
> what you would need it for.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a7db1379-5161-4f7d-ab78-a683c8beb07d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to