Yes I did not know how nGram works ! I find a perfect solution for my picture (base64) problem : use *'char_filter' =>array('html_strip'),*
public function createSetting($pf){ $params = array('index' => $pf, 'body' => array( 'settings' => array( 'number_of_shards' => 5, 'number_of_replicas' => 0, 'analysis' => array( 'filter' => array( 'MYnGram' => array( "token_chars" =>array(), "type" => "nGram", "min_gram" => 3, "max_gram" => 20 ) ), 'analyzer' => array( 'reuters' => array( 'type' => 'custom', 'tokenizer' => 'standard', 'filter' => array('lowercase', 'asciifolding', 'MYnGram'), 'char_filter' =>array('html_strip'), ), ) ) ) )); $this->elasticsearchClient->indices()->create($params); } Thanks to all of you ! Le samedi 21 juin 2014 00:35:39 UTC+2, Clinton Gormley a écrit : > > You seriously don't want 3..250 length ngrams!!!! That's ENORMOUS > > Typically set min/max to 3 or 4, and that's it > > > http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html#_ngrams_for_partial_matching > > > On 20 June 2014 16:05, Tanguy Bernard <bernardt...@gmail.com <javascript:> > > wrote: > >> Thank you Cédric Hourcade ! >> >> Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit : >> >>> If your base64 encodes are long, they are going to be splited in a lot >>> of tokens by the standard tokenizer. >>> >>> Theses tokens are often going to be a lot longer than standard words, >>> so your nGram filter will generate even more tokens, a lot more than >>> with standard text. That may be your problem there. >>> >>> You should really try to strip the encoded images with a simple regex >>> from your documents before indexing them. If you need to keep the >>> source, put the raw text in an unindexed field, and the cleaned one in >>> another. >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2bdd5f30-8e97-43e0-8478-08cc26a03ed9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.