[ https://issues.apache.org/jira/browse/LUCENE-7304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311833#comment-15311833 ]
Paul Elschot commented on LUCENE-7304: -------------------------------------- It might be simpler to try and let EliasFanoDocIdSet extend from BitSet, even though it cannot implement MutableBits. There is a dilemma here: either introduce DocBlocksIterator, or not implement MutableBits. The question is which one would be preferable in the long term for the block join queries: DocBlocksIterator or BitSet? DocBlocksIterator is read only and might involve a little overhead. BitSet implements mutability but that is not needed for the block join queries. > Doc values based block join implementation > ------------------------------------------ > > Key: LUCENE-7304 > URL: https://issues.apache.org/jira/browse/LUCENE-7304 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Martijn van Groningen > Priority: Minor > Attachments: LUCENE-5092-20140313.patch, LUCENE-7304-20160531.patch, > LUCENE_7304.patch > > > At query time the block join relies on a bitset for finding the previous > parent doc during advancing the doc id iterator. On large indices these > bitsets can consume large amounts of jvm heap space. Also typically due the > nature how these bitsets are set, the 'FixedBitSet' implementation is used. > The idea I had was to replace the bitset usage by a numeric doc values field > that stores offsets. Each child doc stores how many docids it is from its > parent doc and each parent stores how many docids it is apart from its first > child. At query time this information can be used to perform the block join. > I think another benefit of this approach is that external tools can now > easily determine if a doc is part of a block of documents and perhaps this > also helps index time sorting? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org