You seriously don't want 3..250 length ngrams!!!! That's ENORMOUS Typically set min/max to 3 or 4, and that's it
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html#_ngrams_for_partial_matching On 20 June 2014 16:05, Tanguy Bernard <bernardtanguy1...@gmail.com> wrote: > Thank you Cédric Hourcade ! > > Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit : > >> If your base64 encodes are long, they are going to be splited in a lot >> of tokens by the standard tokenizer. >> >> Theses tokens are often going to be a lot longer than standard words, >> so your nGram filter will generate even more tokens, a lot more than >> with standard text. That may be your problem there. >> >> You should really try to strip the encoded images with a simple regex >> from your documents before indexing them. If you need to keep the >> source, put the raw text in an unindexed field, and the cleaned one in >> another. >> > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRS_zD%3DkVpKBpqp3hkcgJacAWsETGgJwMQJM%2BqJMuvscw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.