Binh, Thanks, With your help I think I am closer to the answer. Wih the 
sample mapping you provided, I should be able to provide the base 64 
contents of the image file as the "contents" field, and the ocrtext as 
"text field. So, when the ocr text is searched, i can return the "content" 
which is the image. With the above mapping I believe the image is saved in 
the _source as well as the field for "highlighting " purposes, Can I 
prevent it from being stored in _source by something like this?


On Thursday, February 27, 2014 8:29:25 AM UTC-5, Binh Ly wrote:
> You certainly can add a new field, and then just put the OCR text into 
> that new field. So for example:
> Mapping:
>         PutMappingResponse putMappingResponse = new 
> PutMappingRequestBuilder(
> client.admin().indices()).setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
>                 XContentFactory.jsonBuilder().startObject()
>                     .field(DOCUMENT_TYPE).startObject()
>                         .field("properties").startObject()
>                             .field("text").startObject()
>                                 .field("type", "string")
>                             .endObject()
>                             .field("file").startObject()
>                                 .field("store", "yes")
>                                 .field("type", "attachment")
>                                 .field("fields").startObject()
>                                     .field("file").startObject()
>                                         .field("store", "yes")
>                                     .endObject()
>                                 .endObject()
>                             .endObject()
>                         .endObject()
>                     .endObject()
>                 .endObject()
>         ).execute().actionGet();
> Then put the OCR text into the "text" field:
>         IndexResponse indexResponse = client.prepareIndex(INDEX_NAME, 
>             .setSource(XContentFactory.jsonBuilder().startObject()
>                 .field("text", ocrText)
>                 .field("file").startObject()
>                     .field("content", fileContents)
>                     .field("_indexed_chars", -1)
>                 .endObject()
>             .endObject()
>         ).execute().actionGet();
> You probably don't need to index the image binary information - not sure 
> what you would need it for.

You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
To view this discussion on the web visit
For more options, visit

Reply via email to