[ 
https://issues.apache.org/jira/browse/LUCENE-3449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113371#comment-13113371
 ] 

Michael McCandless commented on LUCENE-3449:
--------------------------------------------


bq. We could allocate an extra bit on construction (round up if the last word 
is full) to avoid the extra if statement.

+1 for adding a bit guard at the end; that's a great idea!

But this only works for the nextSetBit case right?  So I don't think
prevSetBit should accept -1.

Let's clearly jdoc that nextSetBit is an exception to the other
methods (ie it allows for exactly 1 (= length()) out-of-bounds index).

bq. but in this case it may be misleading because the API is already mirroring 
juc.BitSet (naming convention), so the contract comes as an assumption.

Well, still we shouldn't let BitSet's mistakes live on through the
method names it had chosen.  I mean at some point someone has to stand
up to BitSet's mistakes ;)  It may as well be us.

I suppose we could also change our names if necessary.

{quote}
bq. Well... I doubt this is easily measured empirically?

Exactly. I think the difference will be minor, if statistically significant at 
all (when running inside a larger program and not in a microbenchmark).
{quote}

Yet the cost is non-zero.

Why add a some cost, any cost, even if it's small, when you don't have
to?

Over time such minor slowdowns can eventually add up, and only by
being vigilant to changes in Lucene's hotspots can we all keep Lucene
fast and lean.

This logic only applies to potential "hotspots", like these methods.
For non-hotpots I fully agree that convenience, readability,
convention, etc., are more important.  But "down low", performance
trumps these other concerns.

Different rules apply down low.  Just like quantum mechanics!


> Fix FixedBitSet.nextSetBit/prevSetBit to support the common usage pattern in 
> every programming book
> ---------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3449
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3449
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/other
>    Affects Versions: 3.4, 4.0
>            Reporter: Uwe Schindler
>            Priority: Minor
>         Attachments: LUCENE-3449.patch
>
>
> The usage pattern for nextSetBit/prevSetBit is the following:
> {code:java}
> for(int i=bs.nextSetBit(0); i>=0; i=bs.nextSetBit(i+1)) {
>  // operate on index i here
> }
> {code}
> The problem is that the i+1 at the end can be bs.length(), but the code in 
> nextSetBit does not allow this (same applies to prevSetBit(0)). The above 
> usage pattern is in every programming book, so it should really be supported. 
> The check has to be done in all cases (with the current impl in the calling 
> code).
> If the check is done inside xxxSetBit() it can also be optimized to be only 
> called seldom and not all the time, like in the ugly looking replacement, 
> thats currently needed:
> {code:java}
> for(int i=bs.nextSetBit(0); i>=0; i=(i<bs.length()-1) ? bs.nextSetBit(i+1) : 
> -1) {
>  // operate on index i here
> }
> {code}
> We should change this and allow out-of bounds indexes for those two methods 
> (they already do some checks in that direction). Enforcing this with an 
> assert is unuseable on the client side.
> The test code for FixedBitSet also uses this, horrible. Please support the 
> common usage pattern for BitSets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to