On Wed, May 21, 2014 at 6:01 PM, Erik Rose <grinche...@gmail.com> wrote:
> I'm trying to move Mozilla's source code search engine (dxr.mozilla.org)
> from a custom-written SQLite trigram index to ES. In the current production
> incarnation, we support fast regex (and, by extension, wildcard) searches by
> extracting trigrams from the search pattern and paring down the documents to
> those containing said trigrams.

This is definitely a great approach for a database, but it won't work
exactly the same way for an inverted index because the datastructure
is totally different.

In the inverted index queries like wildcards are slow: they must
iterate and match all terms in the document collection, then intersect
those postings with the rest of your query. So because its inverted,
it works backwards from what you expect and thats why adding
additional intersections like 'AND' don't speed anything up, they
haven't happened yet.

N-grams can speed up partial matching in general, but the methods to
accomplish this are different: usually the best way to go about it is
to try to think about Analyzing the data in such a way that the
queries to accomplish what you need are as basic as possible.

The first question is if you really need partial matching at all: I
don't have much knowledge about your use case, but just going from
your example, i would look at wildcards like "*Children*Next*" and ask
if instead i'd want to ensure my analyzer split on case-changes, and
try to see if i could get what i need with a sloppy phrase query.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMUKNZUS40rsAjmzrL_YK6yjgjZRumeQKFVPhVu9bUcW4nN_KA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to