[
https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163103#comment-13163103
]
James Dyer commented on SOLR-2509:
----------------------------------
Steffen's changes are most certainly correct. The index contains "pixmaa" and
we are querying on "pixma-a-b-c-d-e-f-g". The spelling index is using analyzer
"lowerpunctfilt" (solrconfig-spellcheckcomponent.xml, line 44) which includes
WordDelimiterFilter and "generateWordParts=1". So we would expect this query
to tokenize down to "pixma" "a" "b" "c" "d" "e" "f" "g". As the Collate
feature is only supposed to replace the misspelled token with the new one, I
wonder why this test scenario would expect all 8 tokens to be replaced by 1
token (!).
Indeed, this test scenario was added during a refactoring (r1022768) with no
JIRA # or bug mentioned at all in the comments. So we can't know for sure why
it was added. I'm thinking this is invalid. I would expect the correct
collation to be "pixma-a-b-c-d-e-f-g".
Just for grins, I put a "println" in SpellingQueryConverter to show the start &
end offsets for each token before and after the patch. In both cases, we get
the same token texts, but prior to the patch the offset values are clearly
wrong.
--before:
TOKEN: pixma so=0 eo=19
TOKEN: a so=0 eo=19
TOKEN: b so=0 eo=19
TOKEN: c so=0 eo=19
TOKEN: d so=0 eo=19
TOKEN: e so=0 eo=19
TOKEN: f so=0 eo=19
TOKEN: g so=0 eo=19
TOKEN: pixmaabcdefg so=0 eo=19
--after:
TOKEN: pixma so=0 eo=5
TOKEN: a so=6 eo=7
TOKEN: b so=8 eo=9
TOKEN: c so=10 eo=11
TOKEN: d so=12 eo=13
TOKEN: e so=14 eo=15
TOKEN: f so=16 eo=17
TOKEN: g so=18 eo=19
TOKEN: pixmaabcdefg so=0 eo=19
> spellcheck: StringIndexOutOfBoundsException: String index out of range: -1
> --------------------------------------------------------------------------
>
> Key: SOLR-2509
> URL: https://issues.apache.org/jira/browse/SOLR-2509
> Project: Solr
> Issue Type: Bug
> Affects Versions: 3.1
> Environment: Debian Lenny
> JAVA Version "1.6.0_20"
> Reporter: Thomas Gambier
> Assignee: Erick Erickson
> Priority: Blocker
> Attachments: SOLR-2509.patch, SOLR-2509.patch, document.xml,
> schema.xml, solrconfig.xml
>
>
> Hi,
> I'm a french user of SOLR and i've encountered a problem since i've installed
> SOLR 3.1.
> I've got an error with this query :
> cle_frbr:"LYSROUGE1149-73190"
> *SEE COMMENTS BELOW*
> I've tested to escape the minus char and the query worked :
> cle_frbr:"LYSROUGE1149(BACKSLASH)-73190"
> But, strange fact, if i change one letter in my query it works :
> cle_frbr:"LASROUGE1149-73190"
> I've tested the same query on SOLR 1.4 and it works !
> Can someone test the query on next line on a 3.1 SOLR version and tell me if
> he have the same problem ?
> yourfield:"LYSROUGE1149-73190"
> Where do the problem come from ?
> Thank you by advance for your help.
> Tom
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]