[
https://issues.apache.org/jira/browse/SOLR-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182577#comment-13182577
]
James Dyer commented on SOLR-2993:
----------------------------------
Okke,
Thanks for looking at this patch. Here are a few comments:
{quote}
if both word parts resulted in suggestions, the collation made no sense.
{quote}
This is a problem with collations in general: By default, it simply mashes the
top corrections together, often resulting in nonsense. The solution is to set
"spellcheck.maxCollationTries" to a non-zero value. Doing so will cause the
spellchecker to vet the collation possibilities against the index, resulting in
collations that are guaranteed to generate hits.
{quote}
"spe llcheck" would give suggestions "spa" and "spellcheck" and collate this
into "spa spellcheck"
{quote}
This is surprising to me and might indicate a bug. This patch is designed to
carefully ensure that when building collations, the corrections do not overlap
one another. For instance if "q=spe llcheck" and it gives corrections of
"spe>spa" and "spe llcheck>spellcheck", it should not collate these to "q=spa
spellcheck" because "spe" overlaps with "spe llcheck". So if you can describe
in detail what you're indexing and querying (maybe paste the resulting xml), it
would be help me figure out what's going on. Better yet, if you can write a
failing unit test and post a patch...
{quote}
I never got any results back when one of the parts had a typo. So "spe llchek"
would not give any suggestions.
{quote}
This patch does not have the ability to first correct a word fragment and then
combine it with another fragment to make a corrected word. Possibly this would
be a good next step after what we've got here already gets worked out.
{quote}
it would also be handy if "spell check" would result in the suggestion
"spellcheck". Or is this already possible?
{quote}
This is the core of what this issue (really LUCENE-3523) is all about, provided
that "spellcheck" is in the dictionary&index you're using.
> Integrate WordBreakSpellChecker with Solr
> -----------------------------------------
>
> Key: SOLR-2993
> URL: https://issues.apache.org/jira/browse/SOLR-2993
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud, spellchecker
> Affects Versions: 4.0
> Reporter: James Dyer
> Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2993.patch
>
>
> A SpellCheckComponent enhancement, leveraging the WordBreakSpellChecker from
> LUCENE-3523:
> - Detect spelling errors resulting from misplaced whitespace without the use
> of shingle-based dictionaries.
> - Seamlessly integrate word-break suggestions with single-word spelling
> corrections from the existing FileBased-, IndexBased- or Direct- spell
> checkers.
> - Provide collation support for word-break errors including cases where the
> user has a mix of single-word spelling errors and word-break errors in the
> same query.
> - Provide shard support.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]