[jira] [Commented] (LUCENE-4518) Suggesters: highlighting (explicit markup of user-typed portions vs. generated portions in a suggestion)

Oliver Christ (JIRA) Wed, 16 Jan 2013 11:56:22 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555396#comment-13555396
 ]


Oliver Christ commented on LUCENE-4518:
---------------------------------------

I’ve played around with Mike’s patches, but for the AnalyzingSuggester the 
results have been mixed. Since the transition symbols in the automaton are not 
closely aligned between the surface and the analyzed form, 
LookupResult.prefixLength (which attempts to represent the length of the 
surface string which corresponds to the lookup string) is off quite a bit, 
leading to very confusing highlighting in non-trivial cases. 
        
I think this is ultimately due to the way how the FST is constructed, but that 
seems to be non-trivial to change.

In addition, just returning the (surface) prefix length which corresponds to 
the lookup string is not sufficient for more complex suggesters, such as “infix 
suggesters” where the user-provided string is not a prefix of the full surface 
term (google.com: type in “sox rumor”). What the suggesters ultimately would 
have to return is a list of text chunks where each chunk has a flag whether 
it’s based on the lookup string or has been auto-completed.

So at this point we are back at trying to identify the matched string portions 
by other means, which isn’t perfect either, but acceptable in most cases. :(

                
> Suggesters: highlighting (explicit markup of user-typed portions vs. 
> generated portions in a suggestion)
> --------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4518
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4518
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Oliver Christ
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4518.patch
>
>
> As a user, I would like the lookup result of the suggestion engine to contain 
> information which allows me to distinguish the user-entered portion from the 
> autocompleted portion of a suggestion. That information can then be used for 
> e.g. highlighting. 
> *Notes:*
> It's trivial if the suggestion engine only applies simple prefix search, as 
> then the user-typed prefix is always a true prefix of the completion. 
> However, it's non-trivial as soon as you use an AnalyzingSuggester, where the 
> completion may (in extreme cases) be quite different from the user-provided 
> input. As soon as case/diacritics folding, script adaptation (kanji/hiragana) 
> come into play, the completion is no longer guaranteed to be an extension of 
> the query. Since the caller of the suggestion engine (UI) generally does not 
> know the implementation details, the required information needs to be passed 
> in the LookupResult.
> *Discussion on java-user:*
> > I haven't found a simple solution for the highlighting yet,
> > particularly when using AnalyzingSuggester (where it's non-trivial).
> Mike McCandless:
> Ahh I see ... it is challenging in that case.  Hmm.  Maybe open an issue for 
> this as well, so we can discuss/iterate?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4518) Suggesters: highlighting (explicit markup of user-typed portions vs. generated portions in a suggestion)

Reply via email to