[ https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614086#comment-13614086 ]
Robert Muir commented on LUCENE-4880: ------------------------------------- I also think its stupid you get 0640 as a token by itself in any case. I dont agree with the unicode property of "letter" for this character as that doesnt makes sense to me, in my opinion it should be "format". I sure hope there is some good reason for this, but to me its crazy. > Difference in offset handling between IndexReader created by MemoryIndex and > one created by RAMDirectory > -------------------------------------------------------------------------------------------------------- > > Key: LUCENE-4880 > URL: https://issues.apache.org/jira/browse/LUCENE-4880 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 4.2 > Environment: Windows 7 (probably irrelevant) > Reporter: Timothy Allison > Attachments: MemoryIndexVsRamDirZeroLengthTermTest.java > > > MemoryIndex skips tokens that have length == 0 when building the index; the > result is that it does not increment the token offset (nor does it store the > position offsets if that option is set) for tokens of length == 0. A regular > index (via, say, RAMDirectory) does not appear to do this. > When using the ICUFoldingFilter, it is possible to have a term of zero length > (the \u0640 character separated by spaces). If that occurs in a document, > the offsets returned at search time differ between the MemoryIndex and a > regular index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org