Hi Irfan,

Newer version of Lucene have a default limit in place for the highest
number of states determinize will create before giving up.

It was added exactly for this reason (accidentally disastrous wildcards).

This is necessary because Lucene converts the incoming wildcard to a
DFA, which has adversarial cases (that your users are hitting!).  It
is also possible, though not implemented in Lucene, to use an NFA and
"be" in multiple states at once.  It likely means slower searching for
non-adversarial cases, but fast searching (compared to today) for
adversarial ones.

I think it would be interesting to explore an NFA implementation for Lucene!

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 30, 2015 at 4:46 PM, Irfan Hamid
<[email protected]> wrote:
> Lucene devs,
>
> We are hitting performance problems when our customers issue pathological
> wildcard queries. Searching the Lucene JIRA I came across these two work
> items and unfortunately it seems like there's no easy way out. However, in
> LUCENE-6672 David Causse has a couple of proposed solutions. I was wondering
> if either of those or something similar were integrated into the code-base
> down the line?
>
> If not, would the community be interested in a pull request if/when we fix
> this in our fork and bake it in production for a while?
>
> TIA,
> Irfan.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to