[ 
https://issues.apache.org/jira/browse/LUCENE-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034513#comment-14034513
 ] 

Michael McCandless commented on LUCENE-5752:
--------------------------------------------

Thanks Rob.

bq. concatenate: as mentioned before, we rely on this today in quite a few 
places, and now the runtime has significantly changed (when the left side is a 
singleton)

Well, in RegExp we followup that concatenate with a minimize.  In
WildcardQuery the incoming automata are small anyway... and I fixed
LevA to insert the prefix itself to avoid the full copy of the fuzzy
suffix part..

bq. singleton: speaking of such, this optimization is removed, but are we sure 
about this? In practice this is probably extremely effective, maybe even 
outweighing any other optimizations we could do.

I really didn't like this duality / mutability (how you sometimes had
to call expandSingleton for ops that cared) and I don't see where this
opto would really make a difference in Lucene.  We already have
DaciukMihov to efficiently build minimal union automaton ...

I agree for a general purpose automaton library this might make sense
... but I don't think it really helps Lucene.

bq. regex/wildcard parsing: we should really test that this isn't totally crazy 
(read: blowing up) now.

I was worried about this too but when I looked at RegExp it calls
minimize after all of these ops!  So I think the added cost of the
copy is likely in the noise ...

bq. acceptStates: should this really be a hashset? is there a reason not to use 
a bitset?

Hmm it could be a bitset.  I thought that typically the number of
accept states is small, but I agree in the case when it's large it'd
be nice to not use way way too much RAM ... I'll change it to bitset.


> Explore light weight Automaton replacement
> ------------------------------------------
>
>                 Key: LUCENE-5752
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5752
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0
>
>         Attachments: LUCENE-5752.patch
>
>
> This effort started with the patch on LUCENE-4556, to create a "light
> weight" replacement for the current object-heavy Automaton class
> (which creates separate State and Transition objects).
> I took that initial patch much further, and cutover most places in
> Lucene that use Automaton to LightAutomaton.  Tests pass.
> The core idea of LightAutomaton is all states are ints, and you build
> up the automaton under the restriction that you add all outgoing
> transitions one state at a time.  This worked well for most
> operations, but for some (e.g. UTF32ToUTF8!!) it was harder, so I also
> added a separate builder to add transitions in any order and then in
> the end they are sorted and added to the real automaton.
> If this is successful I think we should just replace the current
> Automaton with LightAutomaton; right now they both exist in my current
> patch...
> This is very much a work in progress, and I'm not sure the
> restrictions the API imposes are "reasonable" (some algos got uglier).
> But I think it's at least worth exploring/iterating... I'll make a branch and
> commit my current state.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to