Thanks for the reply. the attachment plugin I understand encodes content 
before indexing it, this sounds like an expensive operation if we have lots 
of pdfs. I was thinking extracting text from pdf early on instead and deal 
with text instead.
Does the plugin also work for binaries like images?

On Thursday, January 16, 2014 4:12:47 PM UTC-5, David Pilato wrote:
>
> You can use Tika by yourself (recommended). See how I did it in fsriver 
> project.
> You can use mapper attachment plugin which is using Tika behind the scene 
> but gives you less control IMHO.
>
> About versions, elasticsearch does not keep old versions around. If you 
> need that, you have to manage it yourself.
>
> HTH
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
> Le 16 janv. 2014 à 20:42, ZenMaster80 <sabda...@gmail.com <javascript:>> 
> a écrit :
>
> - Is there any literature on how to index pdf documents and binary formats 
> like images?
> - Versioning question: If I update an already indexed document, I believe 
> ES will update the version number. I am wondering if it keeps the previous 
> document, what if I needed access to the previous document?
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a9e8f331-c4bd-4a4c-be5a-b91e4f2f0e26%40googlegroups.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/94b706cf-c4de-4f94-87b7-48c9e6e814b0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to