I index PDFs using apache with the following mapping.
.field( "type", "attachment" ) .field("fields") .startObject() .startObject("file") .field("store", "yes") .endObject() I want to index photos, I am able to extract text using OCR. I am confused how to index the text though, do I treat it like any document and not as an attachment? I have text as "String" when extracted and not base 64 like in the case of pdfs? I am confused to how it gets stored and how does it work if I need to make it available during search? Can someone explain on how I do this? XContentFactory.jsonBuilder().startObject() .startObject(INDEX_TYPE) .startObject("_source").field("enabled","no").endObject() //This line will not store/not store the base 64 whole _source .startObject("properties") So, My photo object becomes something like this, what about the source (the image itself ?) jsonObject { "content":"text extracted from image" "name":"my_photo.png" } //add to the bulk indexer for indexing bulkProcessor.add(Requests.indexRequest(INDEX_NAME).type(INDEX_TYPE).id( jsonObject.getString("name")).source(jsonObject.toString())); -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2012d7c6-b499-4318-8ae7-512879e5e8b8%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.