[
https://issues.apache.org/jira/browse/LUCENE-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2351:
--------------------------------
Attachment: LUCENE-2351.patch
attached is the same patch as before, except it includes a random test for
Automaton.
I stole the code from TestStressIndexing and create random unicode terms, and
random regular expressions,
and verify them against a brain-dead query that just brute forces every term.
This found two unrelated bugs:
* automaton didnt handle the 'empty term' correctly.
* there was a logic bug in UnicodeUtil.nextValidUTF16String
these are both also fixed in the patch... will commit soon.
> optimize automatonquery
> -----------------------
>
> Key: LUCENE-2351
> URL: https://issues.apache.org/jira/browse/LUCENE-2351
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: Flex Branch
> Reporter: Robert Muir
> Priority: Minor
> Fix For: Flex Branch
>
> Attachments: LUCENE-2351.patch, LUCENE-2351.patch, LUCENE-2351.patch,
> LUCENE-2351_infinite.patch, LUCENE-2351_infinite.patch
>
>
> Mike found a few cases in flex where we have some bad behavior with
> automatonquery.
> The problem is similar to a database query planner, where sometimes simply
> doing a full table scan is faster than using an index.
> We can optimize automatonquery a little bit, and get better performance for
> fuzzy,wildcard,regex queries.
> Here is a list of ideas:
> * create commonSuffixRef for infinite automata, not just really-bad linear
> scan cases
> * do a null check rather than populating an empty commonSuffixRef
> * localize the 'linear' case to not seek, but instead scan, when ping-ponging
> against loops in the state machine
> * add a mechanism to enable/disable the terms dict cache, e.g. we can disable
> it for infinite cases, and maybe fuzzy N>1 also.
> * change the use of BitSet to OpenBitSet or long[] gen for path-tracking
> * optimize the backtracking code where it says /* String is good to go as-is
> */, this need not be a full run(), I think...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]