[ https://issues.apache.org/jira/browse/LUCENE-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15978389#comment-15978389 ]
ASF subversion and git services commented on LUCENE-7791: --------------------------------------------------------- Commit cd1f23c63abe03ae650c75ec8ccb37762806cc75 in lucene-solr's branch refs/heads/branch_6_5 from [~jimczi] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cd1f23c ] LUCENE-7791: fix AIOOBE on NormValuesWriter too > AIOOBE on flush+sort > -------------------- > > Key: LUCENE-7791 > URL: https://issues.apache.org/jira/browse/LUCENE-7791 > Project: Lucene - Core > Issue Type: Bug > Components: core/index > Affects Versions: 6.5 > Reporter: Przemysław Szeremiota > Labels: patch > Fix For: master (7.0), 6.6, 6.5.1 > > Attachments: LUCENE-7791.patch, sortflush.patch, sortflush-test.patch > > > On released 6.5.0 version, flushing operation on sorted index throws > ArrayIndexOutOfBoudException in NumericDocValuesWriter, NormValuesWriter and > BinaryDocValuesWriter. > New SortedXXXIterators are looking up documents in FixedBitSets or > PackedValues based on remapped (sorted) document ID, without checking > BitSets/Values ranges, which are based on original document IDs. Meanwhile > FixedBitSets can be sparse not only in between documents with fields, but > also after last (originally) document with given field (because writer's > addValue() is not called for last documents without values for fields). So > remapped (sorted) values range can have different useful values range and > bounds checking should be done for remapped and not original ID. > We were hit by this bug because our indexes are built from independent > sources by partial updating fragments of documents, so there is always some > documents without values in some fields. > As I understand this bug, it shows when: > - maxDoc is greater than 64 (64 is pre-allocated size for writers > FixedBitSets) > - some number of last taken documents have empty fields (so FixedBitSet won't > be reallocated to maxDoc) > Also, check for range of values for given field is now happening based on > original ID (e.g. "upto < size"), so flushing can now lost some values, even > without hitting AIOOBE. > I will attach patch resolving issues with some writers; for other writers > from LUCENE-7579, I am not sure if there are similar bugs in them; patch > resolved our indexing issues, please check changes from LUCENE-7579 for > confirmation of lack of additional bugs in other flush-sorting writers. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org