It should work with mapper attachment. Remember that what you see in _ source 
is not what you get indexed.


About extracting and storing text content, fsriver does it. See 
https://github.com/dadoonet/fsriver#generated-fields

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 22 avr. 2014 à 08:44, Prashant Agrawal <prashant.agra...@paladion.net> a 
écrit :

Hi Rafał Kuć,
I tried doing the same but I didnt get the result as I want.
Just explaining the problem in details:

1) I have a pdf file which has the text as "There is already a big market
for mid-range 4G LTE market, being pushed by telecom operators and device
manufacturers."

2) I indexed this file in ES and when checked in ES the content present was
in unicode like
"PGh0bWwgeG1sbnM6dj0idXJuOnNjaGVtYXMtbWljcm9zb2Z0LWNvbTp2bWwiDQp4bWxuczpvPSJ1cm46c2NoZW1hHAtZXF1aXY9Q"

3) So if I search for "LTE" it wont return any result because the content
stored in ES is in unicode format.

So my question is, Is there anyway or any plugin to store the pdf content in
normal string format so that I can perform the search on top of that. 



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/Can-we-perform-the-text-search-present-in-the-images-or-pdf-files-through-elasticsearch-tp4054367p4054541.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1398149086168-4054541.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/BA350CA1-91D3-4A80-9F7F-6A45DC742C66%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Reply via email to