>> Shape Discussion:
>>
>> as for getNumberOfBytes() it should return the maximum number of bytes
>> returned by a getBits() call to a filter with this shape.  So yes, if
there
>> is a compressed internal representation, no it won't be that.  It is a
>> method on Shape so it should literally be Math.ceil( getNumberOfBits() /
>> 8.0 )
>>
>> Basically, if you want to create an array that will fit all the bits
>> returned by BloomFilter.iterator() you need an array of
>> Shape.getNumberOfBytes().  And that is actually what I use it for.

>Then you are also mapping the index to a byte index and a bit within the
byte. So if you are doing these two actions then this is something that you
should control.

BloomFilter.getBits returns a long[].  that long[] may be shorter than the
absolute number of bytes specified by Shape.  It also may be longer.

If you want to create a copy of the byte[] you have to know how long it
should be.  The only way to determine that is from Shape, and currently
only if you do the Ceil() method noted above.  There is a convenience in
knowing how long (in bytes) the buffer can be.



On Wed, Mar 18, 2020 at 2:19 PM Claude Warren <cla...@xenei.com> wrote:

> We are getting to the point where there are a lot of options that
> determine which implementation is "best".  We could take a stab at creating
> a BloomFIlterFactory that takes a Shape as an argument and does a "finger
> in the air" guestimate of which implementation best fits.  Store values in
> long blocks or as integers in a list, that sort of thing.  Perhaps in a
> month or so when we really have some idea.
>
>
>
> On Wed, Mar 18, 2020 at 2:16 PM Claude Warren <cla...@xenei.com> wrote:
>
>> You don't need Iterator<IndexCount> iterator() as we have forEachCount(
>> BitCountConsumer )
>>
>> I guess we need something like add( Iterator<IndexCount>) or add(
>> Collection<IndexCount> ) or add( Stream<IndexCount> )
>>
>> It would be nice if we could have a BitCountProducer class that we could
>> just pass to an add() method.
>>
>>
>>
>>
>> On Wed, Mar 18, 2020 at 11:50 AM Alex Herbert <alex.d.herb...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> > On 18 Mar 2020, at 11:14, Claude Warren <cla...@xenei.com> wrote:
>>> >
>>> > On a slightly different note.  CountingBloomFilters have no way to
>>> perform
>>> > a reload.  All other bloom filters you can dump the bits and reload
>>> > (trivial) but if you preserve the counts from a bloom filter and want
>>> to
>>> > reload them you can't.  We need a constructor that takes the
>>> index,count
>>> > pairs somehow.
>>>
>>> Iterator<int[]> ?
>>>
>>> Or foolproof:
>>>
>>> class IndexCount {
>>>     final int index;
>>>     final int count;
>>>     // ...
>>> }
>>>
>>> Iterator<IndexCount>
>>>
>>>
>>> The CountingBloomFilter already has a method forEachCount(…).
>>>
>>> I was reluctant to add some sort of iterator:
>>>
>>> Iterator<?> iterator()
>>>
>>> But we could put in:
>>>
>>> Iterator<IndexCount> iterator()
>>>
>>> It would be inefficient but at least it is fool-proof. The operation is
>>> unlikely to be used very often.
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>
>>>
>>
>> --
>> I like: Like Like - The likeliest place on the web
>> <http://like-like.xenei.com>
>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>
>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to