Re: problem indexing with my analyzer

Clinton Gormley Fri, 20 Jun 2014 15:36:11 -0700

You seriously don't want 3..250 length ngrams!!!! That's ENORMOUS

Typically set min/max to 3 or 4, and that's it


http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html#_ngrams_for_partial_matching


On 20 June 2014 16:05, Tanguy Bernard <bernardtanguy1...@gmail.com> wrote:

> Thank you Cédric Hourcade !
>
> Le vendredi 20 juin 2014 15:32:29 UTC+2, Cédric Hourcade a écrit :
>
>> If your base64 encodes are long, they are going to be splited in a lot
>> of tokens by the standard tokenizer.
>>
>> Theses tokens are often going to be a lot longer than standard words,
>> so your nGram filter will generate even more tokens, a lot more than
>> with standard text. That may be your problem there.
>>
>> You should really try to strip the encoded images with a simple regex
>> from your documents before indexing them. If you need to keep the
>> source, put the raw text in an unindexed field, and the cleaned one in
>> another.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/b62f4e12-1b54-4621-986a-93411404f7af%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRS_zD%3DkVpKBpqp3hkcgJacAWsETGgJwMQJM%2BqJMuvscw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: problem indexing with my analyzer

Reply via email to