Thanks for the reply. the attachment plugin I understand encodes content before indexing it, this sounds like an expensive operation if we have lots of pdfs. I was thinking extracting text from pdf early on instead and deal with text instead. Does the plugin also work for binaries like images?
On Thursday, January 16, 2014 4:12:47 PM UTC-5, David Pilato wrote: > > You can use Tika by yourself (recommended). See how I did it in fsriver > project. > You can use mapper attachment plugin which is using Tika behind the scene > but gives you less control IMHO. > > About versions, elasticsearch does not keep old versions around. If you > need that, you have to manage it yourself. > > HTH > > -- > David ;-) > Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs > > Le 16 janv. 2014 à 20:42, ZenMaster80 <sabda...@gmail.com <javascript:>> > a écrit : > > - Is there any literature on how to index pdf documents and binary formats > like images? > - Versioning question: If I update an already indexed document, I believe > ES will update the version number. I am wondering if it keeps the previous > document, what if I needed access to the previous document? > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearc...@googlegroups.com <javascript:>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/a9e8f331-c4bd-4a4c-be5a-b91e4f2f0e26%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/94b706cf-c4de-4f94-87b7-48c9e6e814b0%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.