[ 
https://issues.apache.org/jira/browse/LUCENE-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190773#comment-13190773
 ] 

Michael McCandless commented on LUCENE-3714:
--------------------------------------------

Ooh, your patch is doing the same algo as mine (best-first search)... yours is 
much simpler/smaller though :)

I think your tie break isn't quite right?  Because, those two arc.labels you 
use aren't necessarily "comparable", since the two paths may not be the same 
length (hmm or share the same "last parent" node)?  You need to roll back and 
do a full compare of the accumulated input labels down that path?  I struggled 
with this too...

I also took advantage of a neat property that every min/+ wFST will have 
(Robert pointed this out): the first arc (only) will have the min output, and 
then there must exist a NO_OUTPUT completion path from that arc to some final 
node.  So, I think my patch will visit paths in the same best-first order as 
yours, but I avoid the push/pop/new object into the queue for each arc 
traversed by greedily/recursively pursuing the first NO_OUTPUT arc I can find.  
I only fall back to the queue when I need to find the next top-N path to 
pursue...


                
> add suggester that uses shortest path/wFST instead of buckets
> -------------------------------------------------------------
>
>                 Key: LUCENE-3714
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3714
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/spellchecker
>            Reporter: Robert Muir
>         Attachments: LUCENE-3714.patch, LUCENE-3714.patch, TestMe.java, 
> out.png
>
>
> Currently the FST suggester (really an FSA) quantizes weights into buckets 
> (e.g. single byte) and puts them in front of the word.
> This makes it fast, but you lose granularity in your suggestions.
> Lately the question was raised, if you build lucene's FST with 
> positiveintoutputs, does it behave the same as a tropical semiring wFST?
> In other words, after completing the word, we instead traverse min(output) at 
> each node to find the 'shortest path' to the 
> best suggestion (with the highest score).
> This means we wouldnt need to quantize weights at all and it might make some 
> operations (e.g. adding fuzzy matching etc) a lot easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to