On Wed, May 21, 2014 at 6:01 PM, Erik Rose <grinche...@gmail.com> wrote: > I'm trying to move Mozilla's source code search engine (dxr.mozilla.org) > from a custom-written SQLite trigram index to ES. In the current production > incarnation, we support fast regex (and, by extension, wildcard) searches by > extracting trigrams from the search pattern and paring down the documents to > those containing said trigrams.
This is definitely a great approach for a database, but it won't work exactly the same way for an inverted index because the datastructure is totally different. In the inverted index queries like wildcards are slow: they must iterate and match all terms in the document collection, then intersect those postings with the rest of your query. So because its inverted, it works backwards from what you expect and thats why adding additional intersections like 'AND' don't speed anything up, they haven't happened yet. N-grams can speed up partial matching in general, but the methods to accomplish this are different: usually the best way to go about it is to try to think about Analyzing the data in such a way that the queries to accomplish what you need are as basic as possible. The first question is if you really need partial matching at all: I don't have much knowledge about your use case, but just going from your example, i would look at wildcards like "*Children*Next*" and ask if instead i'd want to ensure my analyzer split on case-changes, and try to see if i could get what i need with a sloppy phrase query. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZUS40rsAjmzrL_YK6yjgjZRumeQKFVPhVu9bUcW4nN_KA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.