Hi.
Le sam. 14 sept. 2019 à 08:15, Claude Warren <[email protected]> a écrit :
>
> @Gilles
>
> I am happy to rename the package without the plural if that is the
> standard, I will also fix the indent issue. Is there a definition that can
> be quickly imported into Eclipse to do the proper formatting?
Hopefully, someone else can answer that one.
> I am adding/updating all comments in the code.
>
> FilterConfig contains a main method to make it easy to assess how the
> various settings influence the length and number of functions in the
> filter. This could go into another class or a Utility class but neither of
> those seem like a better solution.
I don't think there should be a "main" method in a library.
Also, just printing to stdout does not seem very flexible.
Perhaps define a "toString()" method?
> I agree with your assessment of final and immutable for methods, variables
> and classes.
>
> I am removing the FilterConfig constructor for BloomFilter and ensuring the
> BloomFilter makes a defensive copy.
>
> The difference between match() and equals() is that equals() is a bit
> vector equality and match is the standard bloom filter comparison T&C = T.
> Documentation is being added to the match() and inverseMatch() to explain
> this.
>
> STORED_SIZE is similar to Long.BYTES (perhaps it should be renamed). It
> identifies how many bytes the FilterConfig will use when written to a
> DataOuptut or read from a DataInput.
Why is it necessary to know such internal details?
Anyways, I think that persistence is best left to higher-level code.
>
> I think the Utils class can be removed once the other code is moved to
> Java8 streams and not extended iterators.
>
> Hash function is changed to Objects.hash()
>
> Constructors are/will be private where factories make sense. For some the
> simple constructor makes more sense. (e.g. BloomFilter(BitSet) )
See
https://www.baeldung.com/java-constructors-vs-static-factory-methods
and
https://blog.joda.org/2014/03/valjos-value-java-objects.html
> I have no problem removing the Serializable on FilterConfig, but can you
> explain (or point to a post) why it should be avoided?
Same as above (not prejudging how application want to handle serialization).
> The update/build methods are being documented and an explanation of the
> difference included.
Am I wrong that the "build(...)" methods only spare the typing of 2
characters?
If so, it's not worth the increase of the API size.
>
> getLog() is also being documented
>
> The asBitSet() method was to extract the bitset from the bloom filter so
> that it could be stored otherwise used with BitSet based processes. The
> name of the method is being chagned to getBitSet() and it makes a defensive
> copy.
>
> The ProtoBloomFilter is not a factory. It is really a bean and there are
> times when it makes sense to use the ProtoBloomFilter. The name is
> important as it identifies what it is. It contains the necessary
> information to build any configuration of BloomFilter. There is a proposal
> to use ProtoBloomFilters in service calls. It could be defined as an
> interface with a concrete implementation in the ProtoBloomFilterBuilder.
>
> As an example of ProtoBloomFilter usage. Say you have a Person object that
> has the user's first name, last name, eye color, hair color.
>
> The ProtoBloomFilterBuilder would accept the 4 values as 4 calls to
> update() followed by a build() to build the protoBloomFilter.
> ProtoBloomFilter proto = ne wProtoBloomFilterBuilder().update( "Claude"
> ).update( "Warren" ).update( "brown" ).update( "silver" ).build();
I'm still missing why it cannot be a factory, but maybe it is a
question of design habits. What about the following:
public class BloomFilter {
private final BitSet bitSet;
private BloomFilter(BitSet bs) {
bitSet = (BitSet) (bs.clone());
}
/** Factory method. (cf. "ValJO") */
public BloomFilter of(BitSet bs) {
return new BloomFilter(bs);
}
// Combined "ProtoBloomFilter" and "ProtoBloomFilterBuilder" logic.
public static class Builder {
private final Set<Hash> hashes = new HashSet<>();
private Builder() {}
/** Factory method(s). */
public Builder with(Builder other) {
hashes.addAll(other.hashes);
}
public Builder with(ByteBuffer b) { /* ... */ }
// etc.
/** Factory (converter). */
public BloomFilter create(FilterConfig cfg) {
final BitSet set = new BitSet(cfg.getNumberOfBits());
for (Hash hash : hashes) {
hash.populate(set, cfg);
}
return BloomFilter.of(set);
}
}
}
Perhaps I'm taking too many shortcuts... But, as pointed out by
Gary, more coverage is required; and unit tests will help figure out
the leanest design.
>
> I can create a bloom filter configuration that has 1x10^7 items and
> accepts 1 / 1x10^6 collisions.
> FilterConfig config1 = new FilterConfig( 10000000, 1000000)
>
> and build the BloomFilter
> BloomFilter filter1 = proto.create( config1 )
>
> I can use that filter to determine if the person is in a collection of
> 1x10^7 items. The implementation of that collection could be multiple
> buckets of 1x10^4 items with a collision rate of 5x10^3 items. To check
> that I create an appropriate FilterConfig
> FilterConfig config2 = new FilterConfig( 10000, 5000 )
>
> and then create the proper bloom filter
> BloomFilter filter2 = proto.create( config2 )
>
With my above attempt at an alternative API, we'd have (untested):
BloomFilter.Builder proto = BloomFilter
.with("Claude")
.with("Warren")
.with("brown").
.with("silver");
BloomFilter filter1 = proto.create(config1);
BloomFilter filter2 = proto.create(config2);
IIUC, in your current implementation of "ProtoBloomFilter", there
is a side-effect of "build()" (i.e. the call to "clear()"); thus calling
"build()" twice will not yield the same result. I don't think that's
a desirable feature.
Regards,
Gilles
> Claude
>
>
> On Fri, Sep 13, 2019 at 2:22 PM Gilles Sadowski <[email protected]>
> wrote:
>
> > > > [...]
> > >
> > > Gilles,
> > >
> > > Take a look at the PR, I added comments to allow the PR to have 0 deps.
> > >
> > > Gary
> > >
> > >
> >
> > IMO, the package should be named "bloomfilter" (without "s").
> > Naming seems inconsistent in [Collections]: Some (package)
> > names are singular, other plural.
> >
> > * Indent must be 4 spaces.
> > * All fields and methods must be commented.
> > * "FilterConfig" contains a "main" method.
> > * Local variables and fields must be declared "final" whenever constant.
> > * Some methods are declared "final", others not.
> > * "BloomFilter" should be made immutable.
> > * "BloomFilter" could do without a second constructor: "FilterConfig"
> > should have a method that returns a "BitSet".
> > * "BloomFilter" must make a defensive copy of its "bitSet" argument.
> > * What's the difference between "match" and "equals"?
> > * Why is "STORED_SIZE" public?
> > * "Utils" classes should be avoided.
> > * Hash function should be imported/shaded from "Commons Codec"
> > (and defined there if missing).
> > * Constructor(s) should be private (instantiation through factory
> > methods (cf. ValJO).
> > * "Serializable" should be avoided.
> > * The "update" methods are missing explanations (or reference) about
> > their purpose. Also, it seems that "update" and "build" are redundant.
> > * Does "getLog" return a standard value? If so, the reference should
> > appear in the Javadoc (not just as code comment).
> > * What is the purpose of method "asBitSet()"?
> > * (IIUC) "ProtoBloomFilter" should be renamed "BloomFilterFactory".
> > "ProtoBloomFilterBuilder" should be a static inner class of that
> > factory.
> >
> > Regards,
> > Gilles
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]