[
https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-3888:
--------------------------------
Attachment: LUCENE-3888.patch
updated patch (note with this one: Solr does not yet compile).
I went the route of trying to clean up these apis correctly: I think there are
serious problems here.
The biggest violation is stuff like:
{code}
// convert to array string:
// nocommit: why don't we just return SuggestWord[] with all the information?
// consumers such as Solr must be recomputing this stuff again?!
String[] list = new String[sugQueue.size()];
for (int i = sugQueue.size() - 1; i >= 0; i--) {
list[i] = sugQueue.pop().getSurface();
}
return list;
{code}
DirectSpellChecker already returns all this data, I think its doing the right
thing, but I think SpellChecker should be fixed. Even for the normal case
surely we are recomputing docFreq etc on all the candidates which is wasteful.
I'll keep plugging away but it seems like this will be a pretty serious
refactoring (including e.g. distributed spellcheck refactoring) and difficult
for 3.6.
> split off the spell check word and surface form in spell check dictionary
> -------------------------------------------------------------------------
>
> Key: LUCENE-3888
> URL: https://issues.apache.org/jira/browse/LUCENE-3888
> Project: Lucene - Java
> Issue Type: Improvement
> Components: modules/spellchecker
> Reporter: Koji Sekiguchi
> Assignee: Koji Sekiguchi
> Priority: Minor
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch,
> LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch
>
>
> The "did you mean?" feature by using Lucene's spell checker cannot work well
> for Japanese environment unfortunately and is the longstanding problem,
> because the logic needs comparatively long text to check spells, but for some
> languages (e.g. Japanese), most words are too short to use the spell checker.
> I think, for at least Japanese, the things can be improved if we split off
> the spell check word and surface form in the spell check dictionary. Then we
> can use ReadingAttribute for spell checking but CharTermAttribute for
> suggesting, for example.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]