If you don’t index content, you won’t be able to search for it I guess. That said, Tika can have this extracted characters limit. See indexedChars below:
tika().parseToString(new BytesStreamInput(content, false), metadata, indexedChars); [1] https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456 <https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L456> -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr <https://twitter.com/elasticsearchfr> | @scrutmydocs <https://twitter.com/scrutmydocs> > Le 10 févr. 2015 à 09:24, sreedevi s <sreedevi.payik...@gmail.com> a écrit : > > Hi, > Which is the best method to search in attachments in lucene? I am new > to lucene and I am using version 4.10.2. By making use of Tika, I know I > can convert files to text and then index it as another field. But for large > files that will not be the ideal solution. I believe the maximum characters > per field is 10,000. So, what can be ideal method to search attachments then > > > Best Regards, > Sreedevi S