Hi Irfan, Newer version of Lucene have a default limit in place for the highest number of states determinize will create before giving up.
It was added exactly for this reason (accidentally disastrous wildcards). This is necessary because Lucene converts the incoming wildcard to a DFA, which has adversarial cases (that your users are hitting!). It is also possible, though not implemented in Lucene, to use an NFA and "be" in multiple states at once. It likely means slower searching for non-adversarial cases, but fast searching (compared to today) for adversarial ones. I think it would be interesting to explore an NFA implementation for Lucene! Mike McCandless http://blog.mikemccandless.com On Mon, Nov 30, 2015 at 4:46 PM, Irfan Hamid <[email protected]> wrote: > Lucene devs, > > We are hitting performance problems when our customers issue pathological > wildcard queries. Searching the Lucene JIRA I came across these two work > items and unfortunately it seems like there's no easy way out. However, in > LUCENE-6672 David Causse has a couple of proposed solutions. I was wondering > if either of those or something similar were integrated into the code-base > down the line? > > If not, would the community be interested in a pull request if/when we fix > this in our fork and bake it in production for a while? > > TIA, > Irfan. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
