[
https://issues.apache.org/jira/browse/SOLR-13360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379107#comment-17379107
]
Rainer Jung commented on SOLR-13360:
------------------------------------
The original old code tries to replace tokens in the original query by spell
checked alternatives. It uses the assumption, that the replacement occurs in
increasing position order in the original query.
The query string part to replace is identified by the startOffset and endOffset
noted in the replacement token. These refer to the orignal character indexes in
the query string.
If you want to replace multiple tokens left to right in a string and the sub
strings to replace are given by indexes in the string, you have to be careful,
if the length of the replacement string differs from the original substring. If
the length gets longer, you have to increase the start and end indexes of all
replacements to the right by the same amount, if it gets shorter, you have to
decrease it. The code does exactly this calculation and cumulates the added
increase and decrease in the variable offset.
So it is absolutly necessary, that the assumptions - replacements are done left
to right - is correct.
Now the exception happens in the case, where multiple replacements
(corrections) point to the same character position in the orginal query. How
can that happen? By configuring the spellchecker with a queryAnalyzerFieldType
which uses a synonym list in the query analyzer with the property, that it
replaces a token by multiple tokens. That might not be the comon case, but it
is possible.
Now searching for e.g. a single token in the original query can expand to
multiple tokens by the synonym. All these refer to the same original query
token, so use the same startOffset and endOffset. If the spell checker find
corrections for more than one of these tokens, the collation code will try to
replace the same original query token multiple times, always using the same
startOffset and endOffset plus calculated offset. If the length changes, that
is no longer correct. Example:
original query: "myname"
synonym list: myname, some1 some2 myname, some3 some4 myname
Token list going into spell checker collation: myname, some1, some2, some3,
some4
Assumed spelling corrections: some1 => some, some2=> some, some3=> some,
some4=> some
Replacement happening on original query text "myname":
* myname => some (startOffset 0, endOffset 6, offset 0; new offset -2 because
replacement string is 2 chars shorter, so everything after the replacement
would move 2 positions left)
* some => exception (startOffset 0-2 = -2, endOffset 62 = 4,...)
It is a bit harder to reproduvce than that, because corrections with
positionIncrement == 0 are completely ignored. I couldn't actually find out,
when exactly this happens, but simple synonym lists often result in that value
0.
IMHO the code is wrong by making the assumption, that any replacement refers to
a different token in the original query string. This is no longer true when
synonym replacement by a query analyzer comes into play.
As a workaround, I find it correct to harden the code, altough it does not fix
the root cause. If the replacement tokens are overlapping or not in order left
to right, then the code should skip those out of order or overlapping
replacements.
I will attach a suggested workaround patch.
I think, that using a queryAnalyzerFieldType in a spellchecker config with
collation and with a field type that uses a query analyzer which can result in
a list of tokens where multiple tokens refer to the same original string
position in the query - like synonyms can - is simply not supported. That
should be documented.
> StringIndexOutOfBoundsException: String index out of range: -3
> --------------------------------------------------------------
>
> Key: SOLR-13360
> URL: https://issues.apache.org/jira/browse/SOLR-13360
> Project: Solr
> Issue Type: Bug
> Affects Versions: 7.2.1
> Environment: Solr 7.2.1 - SAP Hybris 6.7.0.8
> Reporter: Ahmed Ghoneim
> Priority: Critical
> Attachments: managed-schema, managed-schema, resources.json,
> solr-config.zip
>
>
> *{color:#ff0000}I cannot execute the following query:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=true{noformat}
> 4/1/2019, 1:16:07 PM ERROR true RequestHandlerBase
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> at
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
> at java.lang.StringBuilder.replace(StringBuilder.java:262)
> at
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
> at
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
> at
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
> at
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> at
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> 4/1/2019, 1:16:07 PM ERROR true HttpSolrCall
> null:java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> {code:java}
> null:java.lang.StringIndexOutOfBoundsException: String index out of range: -3
> at
> java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:851)
> at java.lang.StringBuilder.replace(StringBuilder.java:262)
> at
> org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:252)
> at
> org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:94)
> at
> org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:297)
> at
> org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:209)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> at
> org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:748){code}
> *{color:#14892c}However the following query works:{color}*
> {noformat}
> http://localhost:8983/solr/master_Project_Product_flip/suggest?q=duotop&spellcheck.q=duotop&qt=/suggest&spellcheck.dictionary=de&spellcheck.collate=false{noformat}
> Note: there's a synonym
> {noformat}
> duotop -> Duo Top
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]