If you don’t index content, you won’t be able to search for it I guess.
That said, Tika can have this extracted characters limit. See indexedChars 
below:

tika().parseToString(new BytesStreamInput(content, false), metadata, 
indexedChars);

[1] 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456
 
<https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456>

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
<https://twitter.com/elasticsearchfr> | @scrutmydocs 
<https://twitter.com/scrutmydocs>



> Le 10 févr. 2015 à 09:24, sreedevi s <sreedevi.payik...@gmail.com> a écrit :
> 
> Hi,
>    Which is the best method to search in attachments in lucene? I am new
> to lucene and I am using version 4.10.2. By making use of Tika, I know I
> can convert files to text and then index it as another field. But for large
> files that will not be the ideal solution. I believe the maximum characters
> per field is 10,000. So, what can be ideal method to search attachments then
> 
> 
> Best Regards,
> Sreedevi S

Reply via email to