[ https://issues.apache.org/jira/browse/SOLR-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Dyer updated SOLR-2010: ----------------------------- Attachment: SOLR-2010.patch Tested against branch version #96633 > Improvements to SpellCheckComponent Collate functionality > --------------------------------------------------------- > > Key: SOLR-2010 > URL: https://issues.apache.org/jira/browse/SOLR-2010 > Project: Solr > Issue Type: New Feature > Components: clients - java, spellchecker > Affects Versions: 1.4.1 > Environment: Tested against trunk revision 966633 > Reporter: James Dyer > Priority: Minor > Attachments: SOLR-2010.patch > > > Improvements to SpellCheckComponent Collate functionality > Our project requires a better Spell Check Collator. I'm contributing this as > a patch to get suggestions for improvements and in case there is a broader > need for these features. > 1. Only return collations that are guaranteed to result in hits if re-queried > (applying original fq params also). This is especially helpful when there is > more than one correction per query. The 1.4 behavior does not verify that a > particular combination will actually return hits. > 2. Provide the option to get multiple collation suggestions > 3. Provide extended collation results including the # of hits re-querying > will return and a breakdown of each misspelled word and its correction. > This patch is similar to what is described in SOLR-507 item #1. Also, this > patch provides a viable workaround for the problem discussed in SOLR-1074. A > dictionary could be created that combines the terms from the multiple fields. > The collator then would prune out any spurious suggestions this would cause. > This patch adds the following spellcheck parameters: > 1. spellcheck.maxCollationTries - maximum # of collation possibilities to try > before giving up. Lower values ensure better performance. Higher values may > be necessary to find a collation that can return results. Default is 0, > which maintains backwards-compatible behavior (do not check collations). > 2. spellcheck.maxCollations - maximum # of collations to return. Default is > 1, which maintains backwards-compatible behavior. > 3. spellcheck.collateExtendedResult - if true, returns an expanded response > format detailing collations found. default is false, which maintains > backwards-compatible behavior. When true, output is like this (in context): > <lst name="spellcheck"> > <lst name="suggestions"> > <lst name="hopq"> > <int name="numFound">94</int> > <int name="startOffset">7</int> > <int name="endOffset">11</int> > <arr name="suggestion"> > <str>hope</str> > <str>how</str> > <str>hope</str> > <str>chops</str> > <str>hoped</str> > etc > </arr> > <lst name="faill"> > <int name="numFound">100</int> > <int name="startOffset">16</int> > <int name="endOffset">21</int> > <arr name="suggestion"> > <str>fall</str> > <str>fails</str> > <str>fail</str> > <str>fill</str> > <str>faith</str> > <str>all</str> > etc > </arr> > </lst> > <lst name="collation"> > <str name="collationQuery">Title:(how AND fails)</str> > <int name="hits">2</int> > <lst name="misspellingsAndCorrections"> > <str name="hopq">how</str> > <str name="faill">fails</str> > </lst> > </lst> > <lst name="collation"> > <str name="collationQuery">Title:(hope AND faith)</str> > <int name="hits">2</int> > <lst name="misspellingsAndCorrections"> > <str name="hopq">hope</str> > <str name="faill">faith</str> > </lst> > </lst> > <lst name="collation"> > <str name="collationQuery">Title:(chops AND all)</str> > <int name="hits">1</int> > <lst name="misspellingsAndCorrections"> > <str name="hopq">chops</str> > <str name="faill">all</str> > </lst> > </lst> > </lst> > </lst> > In addition, SOLRJ is updated to include > SpellCheckResponse.getCollatedResults(), which will return the expanded > Collation format. getCollatedResult(), which returns a single String, is > retained for backwards-compatibility. Other APIs were not changed but will > still work provided that spellcheck.collateExtendedResult is false. > This likely will not return valid results if using Shards. Rather, a more > robust interaction with the index would be necessary than what exists in > SpellCheckCollator.collate(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org