[ https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496276#comment-14496276 ]
Luc Vanlerberghe edited comment on LUCENE-6427 at 4/15/15 2:49 PM: ------------------------------------------------------------------- I still have some issues with the ensureCapacity methods: The doc comment states: {code} * If the given {@link FixedBitSet} is large enough to hold {@code numBits}, * returns the given bits, otherwise returns a new {@link FixedBitSet} which * can hold the requested number of bits. {code} while actually it checks if it is large enough to hold (numBits+1) bits (I already changed the doc comment in one of the commits of my pull request to reflect this). In lucene/solr the typical usage is: # {code} docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID); docsWithField.set(docID); {code} but also: # {code} newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length()); newbits.or(otherDocSet.bits); {code} (1) only works because the doc comment doesn't correspond to the implementation. Correct usage would be ... ensureCapacity(docsWithField, docID + 1) (2) will unexpectly grow newBits even when it is exactly the same size. The implementation is written as if the numBits argument is a "number of bits" value, but then proceeds to allocate at least an extra long in the backing array... There are several options here: # Only update the doc comment (like I did) so unsuspecting users don't get an unexpected performance hit when manipulating equal sized bitsets, but then the name stays awkward. # Fix the implementation and update all locations in lucene/solr where it is used (but this may/will affect custom modules without warning) # Rename the methods and "numBits" argument (ensureIndexAvailable anyone?). This will break the compilation of custom modules, but it's @lucene.internal anyway, not a public api. was (Author: lvl): I still have some issues with the ensureCapacity methods: The doc comment states: {code} * If the given {@link FixedBitSet} is large enough to hold {@code numBits}, * returns the given bits, otherwise returns a new {@link FixedBitSet} which * can hold the requested number of bits. {code} while actually it checks if it is large enough to hold (numBits+1) bits (I already changed the doc comment in one of the commits of my pull request to reflect this). In lucene/solr the typical usage is: # {code} docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID); docsWithField.set(docID); {code} but also: # {code} newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length()); newbits.or(otherDocSet.bits); {code} (1) only works because the doc comment doesn't correspond to the implementation. Correct usage would be ... ensureCapacity(docsWithField, docID + 1) (2) will unexpectly grow newBits even when it is exactly the same size. The implementation is written as if the numBits argument is a "number of bits" value, but then proceeds to allocate at least an extra long in the backing array... There are several options here: # Only update the doc comment (like I did) so unsuspecting users don't get an unexpected performance hit when manipulating equal sized bitsets, but then the name stays awkward. # Fix the implementation and update all locations in lucene/solr where it is used (but this may/will affect custom modules without warning) # Rename the methods and "numBits" argument (ensureIndexAvailable anyone?). This will break the compilation of custom modules, but it's @lucene.internal anyway, not a public api. > BitSet fixes - assert on presence of 'ghost bits' and others > ------------------------------------------------------------ > > Key: LUCENE-6427 > URL: https://issues.apache.org/jira/browse/LUCENE-6427 > Project: Lucene - Core > Issue Type: Bug > Components: core/other > Reporter: Luc Vanlerberghe > > Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and > corresponding tests: > * Some methods rely on the fact that no bits are set after numBits (what I > call 'ghost' bits here). > ** cardinality, nextSetBit, intersects and others may yield wrong results > ** If ghost bits are present, they may become visible after ensureCapacity is > called. > ** The tests deliberately create bitsets with ghost bits, but then do not > detect these failures > * FixedBitSet.cardinality scans the complete backing array, even if only > numWords are in use -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org