Re: indexing binary

ZenMaster80 Thu, 27 Feb 2014 11:14:47 -0800

Sorry for the confusion - I do want PDFs, but I am concerned with the 
retrieval of the image file when it ocr text is searched. I must be missing 
something.
As showing below, I provide two fields "text" and the "content". In your 
second post you say I don't need the "content' field for images? So, how 
does the search return the image to the asking client "Web app" for 
instance when a text match occurs with the image "ocr text"? If I only 
include "text", then it will return the text part of the image only and not 
the image, correct?


source(XContentFactory.jsonBuilder()

                                 .startObject()

                                  .field("text",ocrText)    //extracted ocr 
text from image

                                   .field( "file").startObject()

                                     .field("content", fileContents) 
 //content is the encoded base64string of the image file? is it needed?

                                     .field("_indexed_chars", -1)

                                   .endObject()

                                 .endObject()



On Thursday, February 27, 2014 1:16:36 PM UTC-5, Binh Ly wrote:
>
> Oh, the attachment part is for your PDF. If you don't need to index PDFs 
> then just remove that part:
>
>         PutMappingResponse putMappingResponse = new 
> PutMappingRequestBuilder(
>             client.admin().indices()).
> setIndices(INDEX_NAME).setType(DOCUMENT_TYPE).setSource(
>                 XContentFactory.jsonBuilder().startObject()
>                     .field(DOCUMENT_TYPE).startObject()
>                         .field("properties").startObject()
>                             .field("text").startObject()
>                                 .field("type", "string")
>                             .endObject()
>                         .endObject()
>                     .endObject()
>                 .endObject()
>         ).execute().actionGet();
>
> Indexing:
>
>         IndexResponse indexResponse = client.prepareIndex(INDEX_
> NAME, DOCUMENT_TYPE, "1")
>             .setSource(XContentFactory.jsonBuilder().startObject()
>                 .field("text", ocrText)
>             .endObject()
>         ).execute().actionGet();
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/35b9a36f-0a4e-4973-8c03-8d35f0af1a9f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: indexing binary

Reply via email to