[ https://issues.apache.org/jira/browse/SOLR-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12874765#action_12874765 ]
Khaled Hammouda commented on SOLR-1630: --------------------------------------- We just hit this bug as well. To reproduce, you must index a document that contains a hyphen (or underscore) and then search with a misspelled version of the indexed text; e.g. document contains: mid-term query: mis-term result: exception thrown I looked at the code of where this is happening and it seems to be related to token offsets (of the tokenized query) in conjunction with a feature of the spellcheck component called collation. Basically collation tries to replace the original query with the top suggested words. It relies on the tokenizer to remove the original misspelled words and insert the suggested ones (using StringBuilder.replace). Unfortunately the token offsets look weird for words with hyphens (or underscore); for example: query: abc_def 1st token: value = abc; startOffset = 0; endOffset = 7 2nd token: value = def; startOffset = 0; endOffset = 7 Because the two tokens occupy the same range (0-7) this messes up the replacement logic. I'm not sure if this tokenizer behavior is the correct one, but it's part of the problem. Having said that, I tried to change the spellcheck tokenizer from standard to whitespace and this actually solved the problem; no errors and I get correct suggestions. So, until this gets fixed you can either: 1) Disable spellchecker collation, or 2) Use a whitespace tokenizer for the spellchecker component > StringIndexOutOfBoundsException in SpellCheckComponent > ------------------------------------------------------ > > Key: SOLR-1630 > URL: https://issues.apache.org/jira/browse/SOLR-1630 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis, spellchecker > Affects Versions: 1.4 > Environment: Solr 1.4 > Lucene 2.9.1 > Win XP > java version "1.6.0_14" > Reporter: Robin Wojciki > Assignee: Shalin Shekhar Mangar > Attachments: bug.xml, schema.xml, SOLR-1630.patch, solrconfig.xml, > spellcheckconfig.xml > > > For some documents/search strings, the SpellCheckComponent throws > StringIndexOutOfBoundsException > See: http://www.lucidimagination.com/search/document/3be6555227e031fc/ > h2. Replication > * Save attached schema.xml and solrconfig.xml in > apache-solr-1.4.0/example/solr/conf > * Start Solr > * Index attached bug.xml > * Query [http://localhost:8983/solr/select/?q=awehjse-wjkekw] > It throws a StringIndexOutOfBoundsException > {noformat} String index out of range: -7 > java.lang.StringIndexOutOfBoundsException: String index out of range: -7 > at java.lang.AbstractStringBuilder.replace(Unknown Source) > at java.lang.StringBuilder.replace(Unknown Source) > at > org.apache.solr.handler.component.SpellCheckComponent.toNamedList(SpellCheckComponent.java:248) > at > org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:143) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org