[ 
https://issues.apache.org/jira/browse/LUCENE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496276#comment-14496276
 ] 

Luc Vanlerberghe edited comment on LUCENE-6427 at 4/15/15 2:49 PM:
-------------------------------------------------------------------

I still have some issues with the ensureCapacity methods:
The doc comment states:
{code}
   * If the given {@link FixedBitSet} is large enough to hold {@code numBits},
   * returns the given bits, otherwise returns a new {@link FixedBitSet} which
   * can hold the requested number of bits.
{code}
while actually it checks if it is large enough to hold (numBits+1) bits (I 
already changed the doc comment in one of the commits of my pull request to 
reflect this).
In lucene/solr the typical usage is:
# {code}
    docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID);
    docsWithField.set(docID);
{code}
but also:
# {code}
      newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length());
      newbits.or(otherDocSet.bits);
{code}

(1) only works because the doc comment doesn't correspond to the 
implementation.  Correct usage would be ... ensureCapacity(docsWithField, docID 
+ 1)
(2) will unexpectly grow newBits even when it is exactly the same size.

The implementation is written as if the numBits argument is a "number of bits" 
value, but then proceeds to allocate at least an extra long in the backing 
array...

There are several options here:
# Only update the doc comment (like I did) so unsuspecting users don't get an 
unexpected performance hit when manipulating equal sized bitsets, but then the 
name stays awkward.
# Fix the implementation and update all locations in lucene/solr where it is 
used (but this may/will affect custom modules without warning)
# Rename the methods and "numBits" argument (ensureIndexAvailable anyone?).  
This will break the compilation of custom modules, but it's @lucene.internal 
anyway, not a public api.



was (Author: lvl):
I still have some issues with the ensureCapacity methods:
The doc comment states:
{code}
   * If the given {@link FixedBitSet} is large enough to hold {@code numBits},
   * returns the given bits, otherwise returns a new {@link FixedBitSet} which
   * can hold the requested number of bits.
{code}
while actually it checks if it is large enough to hold (numBits+1) bits (I 
already changed the doc comment in one of the commits of my pull request to 
reflect this).
In lucene/solr the typical usage is:
# {code}
    docsWithField = FixedBitSet.ensureCapacity(docsWithField, docID);
    docsWithField.set(docID);
{code}
but also:
# {code}
      newbits = FixedBitSet.ensureCapacity(newbits, otherDocSet.bits.length());
      newbits.or(otherDocSet.bits);
{code}
(1) only works because the doc comment doesn't correspond to the 
implementation.  Correct usage would be ... ensureCapacity(docsWithField, docID 
+ 1)
(2) will unexpectly grow newBits even when it is exactly the same size.

The implementation is written as if the numBits argument is a "number of bits" 
value, but then proceeds to allocate at least an extra long in the backing 
array...

There are several options here:
# Only update the doc comment (like I did) so unsuspecting users don't get an 
unexpected performance hit when manipulating equal sized bitsets, but then the 
name stays awkward.
# Fix the implementation and update all locations in lucene/solr where it is 
used (but this may/will affect custom modules without warning)
# Rename the methods and "numBits" argument (ensureIndexAvailable anyone?).  
This will break the compilation of custom modules, but it's @lucene.internal 
anyway, not a public api.


> BitSet fixes - assert on presence of 'ghost bits' and others
> ------------------------------------------------------------
>
>                 Key: LUCENE-6427
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6427
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/other
>            Reporter: Luc Vanlerberghe
>
> Fixes after reviewing org.apache.lucene.util.FixedBitSet, LongBitSet and 
> corresponding tests:
> * Some methods rely on the fact that no bits are set after numBits (what I 
> call 'ghost' bits here).
> ** cardinality, nextSetBit, intersects and others may yield wrong results
> ** If ghost bits are present, they may become visible after ensureCapacity is 
> called.
> ** The tests deliberately create bitsets with ghost bits, but then do not 
> detect these failures
> * FixedBitSet.cardinality scans the complete backing array, even if only 
> numWords are in use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to