[
https://issues.apache.org/jira/browse/LUCENE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850545#action_12850545
]
Michael McCandless commented on LUCENE-2351:
--------------------------------------------
The attached patch improves sneaky wildcard query "un*t" (on a 5M doc wikipedia
index, matching 1058 terms --> 124623 docs) from 39.69 QPS -> 44.85 QPS (best
of 5) on flex. But trunk is at 63.19 QPS so we still have more to do...
> optimize automatonquery
> -----------------------
>
> Key: LUCENE-2351
> URL: https://issues.apache.org/jira/browse/LUCENE-2351
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: Flex Branch
> Reporter: Robert Muir
> Priority: Minor
> Fix For: Flex Branch
>
> Attachments: LUCENE-2351.patch
>
>
> Mike found a few cases in flex where we have some bad behavior with
> automatonquery.
> The problem is similar to a database query planner, where sometimes simply
> doing a full table scan is faster than using an index.
> We can optimize automatonquery a little bit, and get better performance for
> fuzzy,wildcard,regex queries.
> Here is a list of ideas:
> * create commonSuffixRef for infinite automata, not just really-bad linear
> scan cases
> * do a null check rather than populating an empty commonSuffixRef
> * localize the 'linear' case to not seek, but instead scan, when ping-ponging
> against loops in the state machine
> * add a mechanism to enable/disable the terms dict cache, e.g. we can disable
> it for infinite cases, and maybe fuzzy N>1 also.
> * change the use of BitSet to OpenBitSet or long[] gen for path-tracking
> * optimize the backtracking code where it says /* String is good to go as-is
> */, this need not be a full run(), I think...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]