I donno if this decay operation is a standard practice around Bloom filters, but it presumably works as follows.
If you've two filters with *distinct* hash functions on the same elements, rather than new elements, then together they act like a bloom filter with twice the bit width as many hash functions, which should remain optimal*, but you could now throw away the second filter to save memory and increase your false positive rate. You'd probably "freeze" the first filter and use only the second freshly cleaned filter for writing, so you membership test switches from being in both filters (and) to being in either one (or). I could imagine doing this if traffic on the old server key slowed down as it approaches expiry. You go from dropping only p^2 false replay drops to between p and 1-(1-p)^2 as the second filter fills up, where p is the false drop for one filter. Sounds reasonable, Jeff * In the naive analysis, if I've two distinct filters on the same elements with the same values for p and m/n, then together they've effectively p^2 and 2*m/n because 2 ln(p) = ln(p^2), so trashing one gives you one filter with parameters p and m/n. I've sen articles that improve on the usual analysis for Bloom filters. If I recall, they show that you lose some efficiency doing this, but probably not enough to care about. It's maybe another reason why you'd only want to decay once though. On Fri, 2015-10-09 at 21:10 +0000, str4d wrote: > Jeff Burdges wrote: > > p.s. After writing this, I noticed that bizarrely I2P actually > > does roughly this : > > https://geti2p.net/en/docs/tunnels/implementation It seems odd to > > do so since they've circuits, but maybe messages can get out of > > order in their tunnels or something. Or maybe they're being > > unclear. > > Quoting that documentation page: > > "Even though the tunnels within I2P bear a resemblance to a circuit > switched network, everything within I2P is strictly message based - > tunnels are merely accounting tricks to help organize the delivery of > messages. No assumptions are made regarding reliability or ordering > of > messages, and retransmissions are left to higher levels (e.g. I2P's > client layer streaming library). This allows I2P to take advantage of > throttling techniques available to both packet switched and circuit > switched networks." > > In I2P's case, replay protection is a) good for the routers, and b) > necessary to prevent certain classes of attacks. It does put upper > limits on the bandwidth that routers can share with the network, so > to > balance the Bloom filter memory requirements we adjust the parameters > depending on what the router is configured to share. Our absolute > maximum was 4MBps for a long time, but after requests from users that > were filling that easily, our current maximum shared bandwidth is > 16MBps > . > > See [0] for our Bloom filter impl, and an analysis of false positive > rates. See [1] for how we select the size of the Bloom filter based > on > the shared bandwidth and configured memory. See [2] for the decaying > hash set that we use in several other places instead of a Bloom > filter. > > str4d > > [0] > https://github.com/i2p/i2p.i2p/blob/master/router/java/src/net/i2p/ro > ute > r/util/DecayingBloomFilter.java > [1] > https://github.com/i2p/i2p.i2p/blob/master/router/java/src/net/i2p/ro > ute > r/tunnel/BloomFilterIVValidator.java > [2] > https://github.com/i2p/i2p.i2p/blob/master/router/java/src/net/i2p/ro > ute > r/util/DecayingHashSet.java > _______________________________________________ > Messaging mailing list > [email protected] > https://moderncrypto.org/mailman/listinfo/messaging
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Messaging mailing list [email protected] https://moderncrypto.org/mailman/listinfo/messaging
